Quick'n'Dirty Phoneme Extractor is a little Win32 command-line app (32KB) which uses the Microsoft SAPI speech recognition engine (in dictation mode) to try to estimate phoneme and timing information from standard PCM wave audio files (*.wav) containing recorded speech. The output data can then be used for rough lipsync animations in synthetic agents (by a simple mapping of phonemes to visemes). The recognition accuracy is not particularly high (as you'll discover if you use the -t command-line option), but because the resultant phoneme sequences sound very similar to the original speech, whether or not the word recognition is exactly correct, so the resultant viseme sequences look very similar to what one would expect. In fact, I've found that the results can be suprisingly effective. You can even get passable lipsync animations for non-English speech files (and get a good laugh out of the 'English' transcriptions generated in the process).
New in v1.10: As an alternative to using the dictation grammar, a 'hints' file can be supplied which provides accurate transcriptions for some or all of the prompts. For each of the prompts specified in the hints file, the app will compile a command grammar with a single rule which corresponds word-for-word to the transcription and will use this grammar instead of the dictation grammar.
New in v1.20: You can use the -x command-line option to output the phoneme sequence in the XML format used by CryENGINE2, rather than the default output format.
You will need to have the Microsoft U.S. English speech recognition engine installed in order to run this app. You can either install this via Microsoft Office XP setup (so I'm told) or by downloading and installing the Microsoft Speech SDK 5.1.
Click here to download the app.
Run phonemes.exe without arguments for usage instructions. See the accompanying ReadMe.txt for further information.
If you find this app useful, please drop me a note and let me know!