speech to text – word segmentation


op.recognize is a speech recognition external (speech to text) based on the Sphinx library. It translates the incoming signal into text and gives time positions of phonemes and recognized words recorded into a buffer~. No specific voice learning is needed.
op.recognize consists of a hmm system working with phonemes. It can work with several languages using a language dictionary: 63998 English and 107227 French words for instance.

search graph - hmm with phonemes

The more flexible the database, the more possible miss there will be. Three ways for preparing the recognition are available:

An acoustic model is needed. An English one coming from Sphinx is provided with the external. It is possible to create your own using SphinxTrain for other languages, accents etc…

One interesting application is concatenative synthesis using words or phonemes segmentation.
How to make someone talk with the voice of someone else: real-time voice alignment is one possible application. Two sentences are said at different speeds. After segmentation, one of the sentences is stretched in order fit with the other one (use of supervp.play~ for the stretch).
Other applications could be text follower in theatre or installations, dictation, translation, chatbots, summarizer and concatenative voice synthesis (recreate sentences from existing segments).

op.recognize help patch