Open Source Dictation: Wrapping up

Ten days ago, I completed the dictation prototype - just in time for this years Akademy conference.


At Akademy, I gave a talk about open source speech recognition and demoed the dictation prototype.
The slides and the video of the talk are both already available. If you've seen the talk, please consider leaving me some feedback - it's always appreciated.

On Tuesday, I held a 2 hour BoF session dedicated to open source speech recognition: First, I quickly re-established the core pillars of an LVCSR system and explained how all those components fit together. Then we talked about where one could potentially source additional material for building higher quality language and acoustic models and discussed some applications of speech recognition technology relevant to the larger KDE community.

As a side note: This years Akademy was certainly one of the best conferences I've been to thus far. The talks and BoF sessions were great, the atmosphere inspired and the people - as always - just awesome. A special thanks also to the local team and all the organizers which put together a program that was simply sublime.

Where's the code?

When I started, I told you that I'll share all data created during the course of this project. As promised:

(I decided to share the unadapted acoustic model instead of the final, adapted one I used in the video because the latter is specifically tailored to my own voice and I suppose that is not really useful for anyone but me. If you're really interested in the adapted model for the sake of reproducability, I'm of course also happy to share this model as well.)

As I mentioned repeatedly, this is "just" a prototype and absolutely not intended for end-user consumption. Even with all the necessary data files, setting up a working system is anything but trivial. If you're looking for a ready-to-use system - and I can't stress this enough: Simon is not (yet) it!

Where to go from here?

As many of you will have noticed, the project was partly also intended to find potentially interested contributers to join me in building open source speech recognition systems. In this regard, I'm happy to report that in the last 10 days, quite a few people contacted me and asked how to get involved.

I'll hold an IRC meeting in the coming week to discuss possible tasks and how to get started. If you're interested in joining the meeting, please get in touch.



I have been following your posts with great interest. I am exploring the opportunities with the speech recognition in Indian accent English. I am using pocketsphinx for the experimets. My experiments gives me the WER which is poor compared to Google Speech API even for the small vocabulary. I have used FSG which contains aroud 500 words as grammar. My initial guess is that acoustic model plays most important role in overall recognition. However, I have no idea on what it takes to adapt the acoustic model for general purpose use. Can you suggest how much time and effort is required in adaptation.
Thanks Peter. Your blogs are very informative and I learnt many new things from it.

Peter Grasch's picture


it's tough to give even rough estimates without more information. But you should check out the resources on the SPHINX homepage - it should give you an idea:

Best regards,

Add new comment

Filtered HTML

  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <blockquote> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.