Open Source Dictation: Wrapping up

Ten days ago, I completed the dictation prototype – just in time for this years Akademy conference.


At Akademy, I gave a talk about open source speech recognition and demoed the dictation prototype.
The slides and the video of the talk are both already available. If you’ve seen the talk, please consider leaving me some feedback – it’s always appreciated.

On Tuesday, I held a 2 hour BoF session dedicated to open source speech recognition: First, I quickly re-established the core pillars of an LVCSR system and explained how all those components fit together. Then we talked about where one could potentially source additional material for building higher quality language and acoustic models and discussed some applications of speech recognition technology relevant to the larger KDE community.

As a side note: This years Akademy was certainly one of the best conferences I’ve been to thus far. The talks and BoF sessions were great, the atmosphere inspired and the people – as always – just awesome. A special thanks also to the local team and all the organizers which put together a program that was simply sublime.

Where’s the code?

When I started, I told you that I’ll share all data created during the course of this project. As promised:

(I decided to share the unadapted acoustic model instead of the final, adapted one I used in the video because the latter is specifically tailored to my own voice and I suppose that is not really useful for anyone but me. If you’re really interested in the adapted model for the sake of reproducability, I’m of course also happy to share this model as well.)

As I mentioned repeatedly, this is “just” a prototype and absolutely not intended for end-user consumption. Even with all the necessary data files, setting up a working system is anything but trivial. If you’re looking for a ready-to-use system – and I can’t stress this enough: Simon is not (yet) it!

Where to go from here?

As many of you will have noticed, the project was partly also intended to find potentially interested contributers to join me in building open source speech recognition systems. In this regard, I’m happy to report that in the last 10 days, quite a few people contacted me and asked how to get involved.

I’ll hold an IRC meeting in the coming week to discuss possible tasks and how to get started. If you’re interested in joining the meeting, please get in touch.

Facebooktwitterredditpinterestlinkedinmailby feather

Peter Grasch