Open Source Dictation: Wrapping up

Ten days ago, I completed the dictation prototype - just in time for this years Akademy conference.


At Akademy, I gave a talk about open source speech recognition and demoed the dictation prototype.
The slides and the video of the talk are both already available. If you've seen the talk, please consider leaving me some feedback - it's always appreciated.

On Tuesday, I held a 2 hour BoF session dedicated to open source speech recognition: First, I quickly re-established the core pillars of an LVCSR system and explained how all those components fit together. Then we talked about where one could potentially source additional material for building higher quality language and acoustic models and discussed some applications of speech recognition technology relevant to the larger KDE community.

As a side note: This years Akademy was certainly one of the best conferences I've been to thus far. The talks and BoF sessions were great, the atmosphere inspired and the people - as always - just awesome. A special thanks also to the local team and all the organizers which put together a program that was simply sublime.

Where's the code?

When I started, I told you that I'll share all data created during the course of this project. As promised:

(I decided to share the unadapted acoustic model instead of the final, adapted one I used in the video because the latter is specifically tailored to my own voice and I suppose that is not really useful for anyone but me. If you're really interested in the adapted model for the sake of reproducability, I'm of course also happy to share this model as well.)

As I mentioned repeatedly, this is "just" a prototype and absolutely not intended for end-user consumption. Even with all the necessary data files, setting up a working system is anything but trivial. If you're looking for a ready-to-use system - and I can't stress this enough: Simon is not (yet) it!

Where to go from here?

As many of you will have noticed, the project was partly also intended to find potentially interested contributers to join me in building open source speech recognition systems. In this regard, I'm happy to report that in the last 10 days, quite a few people contacted me and asked how to get involved.

I'll hold an IRC meeting in the coming week to discuss possible tasks and how to get started. If you're interested in joining the meeting, please get in touch.



I have been following your posts with great interest. I am exploring the opportunities with the speech recognition in Indian accent English. I am using pocketsphinx for the experimets. My experiments gives me the WER which is poor compared to Google Speech API even for the small vocabulary. I have used FSG which contains aroud 500 words as grammar. My initial guess is that acoustic model plays most important role in overall recognition. However, I have no idea on what it takes to adapt the acoustic model for general purpose use. Can you suggest how much time and effort is required in adaptation.
Thanks Peter. Your blogs are very informative and I learnt many new things from it.

Peter Grasch's picture


it's tough to give even rough estimates without more information. But you should check out the resources on the SPHINX homepage - it should give you an idea:

Best regards,

Sanvhost provides customers with reseller hosting and low-cost shared hosting. Its offers cPanel, Plesk panel round-the-clock support and a range of free one-click install scripts and applications. We provide everything from affordable shared hosting to dedicated servers. The host offers free website transfers from other hosts, including all of the customer’s files and databases. Whether you are looking to host personal websites, small or large business websites, blogs, forums, audio/video streaming,reseller platform and virtual or dedicated environments, we have a solution for you. Our web hosting services are feature rich including Sanvhost wordpress, joomla, shopping carts, ecommerce scripts, and multi language panel, CGI/Perl, MySQL, PHP and much more.Affordable hosting package offered by Sanvhost which not only provides the best in terms of hosting packages but also believes in truly being there for the customer, 24x7 chat support. Cheap hosting Moreover , they offer unlimited bandwidth as well as nearly 1GB storage along with database maintenance, email facility along with storage, availability of sub domain and many other important features for a very low price.Sanvhost is dicated web hosting company providing quality VPS hosting for websites and has plans ( Windows cheap VPS, Forex VPS, Plesk VPS, Shared Hosting, LinuxVps and Windows cloud VPS ) catering to everyone’s needs and we do provide 7 days money back guarantee. If your website is grown up or not running smoothly, we can provide you quality Virtual private server (VPS) hosting at just 9.99 USD per month. In VPS you will get all the features of a dedicated server for fraction of a dedicated server cost. You will get full root access, can host unlimited domains, unlimited email ids. You can install any software which need root access and can set any configuration setting as per your need.We offer high quality and professional IT solutions and services to meet the needs of businesses across the globe. We deliver innovative webhosting solutions to our clients. Sanvhost offers one of the cheapest web hosting plans around with unlimited bandwidth and unlimited web space, and many other unbeatable features in shared hosting. Sanvhost a complete Hosting solution.

For more info visit Window Hosting | Linux Hosting | Windows Vps | Linux Vps | PLesk Vps | Forex Vps | SmarterMail

I've looking forward to the summer release of the dictation!

Add new comment

Filtered HTML

  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <blockquote> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.