speech to text – word segmentation
op.recognize is a speech recognition external (speech to text) based on the Sphinx library. It translates the incoming signal into text and gives time positions of phonemes and recognized words recorded into a buffer~. No specific voice learning is needed.
op.recognize consists of a hmm system working with phonemes. It can work with several languages using a language dictionary: 63998 English and 107227 French words for instance.
The more flexible the database, the more possible miss there will be. Three ways for preparing the recognition are available:
- a grammar file using JSpeech Grammar Format and describing the order of the words to recognize.
- a language model trigram file generated by Carnegie Mellon University Statistical Language Modeling toolkit containing the text to recognize.
- no preparation at all in the case what will be said is unknown. This method is more general but less effective.
One interesting application is concatenative synthesis using words or phonemes segmentation.
How to make someone talk with the voice of someone else: real-time voice alignment is one possible application. Two sentences are said at different speeds. After segmentation, one of the sentences is stretched in order fit with the other one (use of supervp.play~ for the stretch).
Other applications could be text follower in theatre or installations, dictation, translation, chatbots, summarizer and concatenative voice synthesis (recreate sentences from existing segments).