Thanks to everyone who participated in the poll last week about what speech recognition project you’d most like to see.
The week is over, and Dictation has emerged as a clear winner!
As promised, I’ll now try to build a proof of concept level prototype of such a system in time for this years Akademy.
With a system as complex as continuous dictation, there are obviously a wide range of challenges.
Here’s just a few of the problems I’ll need to tackle in the next two weeks:
- Acoustic model: The obvious elephant in the room: Any good speech recognition system needs an accurate representation of how it expects users to pronounce the words in its dictionary.
- Language model: “English” is simply not good enough – or when have you last tried to write “Huzzah!”? We not only need to restrict vocabulary to a sensible subset but also gain a pretty good understanding what a user might intend to write. This is not only important to avoid computationally prohibitive vocabulary sizes but to differentiate i.e. “@ home” and “at home”.
- Dictation application: Even given a perfect speech recognition system, dictation is still a bit off. You’ll also need some form of software that handles the resulting recognition result and applies formatting (casing, etc.) and allows users to correct recognition mistakes, change the structure, etc.
Obviously, I won’t be able to solve all these issues in this short time frame but I’ll do my very best to show off a presentable prototype that addresses all these areas. Watch this blog for updates over the coming weeks!by