Open Source Dictation: Wrapping up

Ten days ago, I completed the dictation prototype - just in time for this years Akademy conference.

Akademy

At Akademy, I gave a talk about open source speech recognition and demoed the dictation prototype.
The slides and the video of the talk are both already available. If you've seen the talk, please consider leaving me some feedback - it's always appreciated.

On Tuesday, I held a 2 hour BoF session dedicated to open source speech recognition: First, I quickly re-established the core pillars of an LVCSR system and explained how all those components fit together. Then we talked about where one could potentially source additional material for building higher quality language and acoustic models and discussed some applications of speech recognition technology relevant to the larger KDE community.

As a side note: This years Akademy was certainly one of the best conferences I've been to thus far. The talks and BoF sessions were great, the atmosphere inspired and the people - as always - just awesome. A special thanks also to the local team and all the organizers which put together a program that was simply sublime.

Where's the code?

When I started, I told you that I'll share all data created during the course of this project. As promised:

(I decided to share the unadapted acoustic model instead of the final, adapted one I used in the video because the latter is specifically tailored to my own voice and I suppose that is not really useful for anyone but me. If you're really interested in the adapted model for the sake of reproducability, I'm of course also happy to share this model as well.)

As I mentioned repeatedly, this is "just" a prototype and absolutely not intended for end-user consumption. Even with all the necessary data files, setting up a working system is anything but trivial. If you're looking for a ready-to-use system - and I can't stress this enough: Simon is not (yet) it!

Where to go from here?

As many of you will have noticed, the project was partly also intended to find potentially interested contributers to join me in building open source speech recognition systems. In this regard, I'm happy to report that in the last 10 days, quite a few people contacted me and asked how to get involved.

I'll hold an IRC meeting in the coming week to discuss possible tasks and how to get started. If you're interested in joining the meeting, please get in touch.

Tags:

Comments

I have been following your posts with great interest. I am exploring the opportunities with the speech recognition in Indian accent English. I am using pocketsphinx for the experimets. My experiments gives me the WER which is poor compared to Google Speech API even for the small vocabulary. I have used FSG which contains aroud 500 words as grammar. My initial guess is that acoustic model plays most important role in overall recognition. However, I have no idea on what it takes to adapt the acoustic model for general purpose use. Can you suggest how much time and effort is required in adaptation.
Thanks Peter. Your blogs are very informative and I learnt many new things from it.

Peter Grasch's picture

Hi,

it's tough to give even rough estimates without more information. But you should check out the resources on the SPHINX homepage - it should give you an idea:
http://cmusphinx.sourceforge.net/wiki/tutorialam
http://cmusphinx.sourceforge.net/wiki/tutorialadapt

Best regards,
Peter

I've looking forward to the summer release of the dictation!

Hello,

I tried down loading the file from the url:
http://files.kde.org/accessibility/Simon/lm/cmudict.languageProfile.mirr...

But after down loading I got the following message and the data was junk.

File information
Filename: cmudict.languageProfile
Size: 15M (16249587 bytes)
Last modified: Mon, 08 Sep 2014 09:57:57 GMT (Unix time: 1410170277)
SHA-256 Hash: ae4e7455e0e8a542a2ef8b947f70fed73b7c44be9e3dc73d05137b551779aee0
SHA-1 Hash: 8efb910677a3ad7016b65bec2bf26ce3c6be122d
MD5 Hash: 5e99a7b1f98f138320287d0852400322

Download file

There was a problem opening the file /tmp/cmudict.languageProfile.
The file you opened has some invalid characters. If you continue editing this file you could corrupt this document.
You can also choose another character encoding and try again.

The data looks as follows:

€#csequitur
Model
q#)q#}q#(U#sequiturq#(csequitur
Sequitur
q#o}q#(U#rightInventoryq#(csymbols
SymbolInventory
q#oq }q
(U#listq#]q (U#__void__q
U#__term__q#X#\00\00\00KX#\00\00\00LX#\00\00\00OWq#X#\00\00\00ZX#\00\00\00WX#\00\00\00TX#\00\00\00DX#\00\00\00AHq#X#\00\00\00BX#\00\00\00EHq#X#\00\00\00NX#\00\00\00VX#\00\00\00IHq#X#\00\00\00SX#\00\00\00SHq#X#\00\00\00AAq#X#\00\00\00RX#\00\00\00PX#\00\00\00AYq#X#\00\00\00ERq#X#\00\00\00AEq#X#\00\00\00MX#\00\00\00AOq#X#\00\00\00NGq#X#\00\00\00GX#\00\00\00THq#X#\00\00\00IYq#X#\00\00\00FX#\00\00\00DHq#X#\00\00\00HHq#X#\00\00\00UHq#X#\00\00\00OYq#X#\00\00\00CHq X#\00\00\00EYq!X#\00\00\00UWq"X#\00\00\00AWq#X#\00\00\00JHq$X#\00\00\00YX#\00\00\00ZHq%eU#dirq&}q'(h#K#h#K%h#K#X#\00\00\00YK'h#K#h K"h$K&h%K(h#K#h#K#h#K#h#K#h#K#X#\00\00\00BK
h#K#X#\00\00\00DK#X#\00\00\00GK#X#\00\00\00FK#h#K X#\00\00\00KK#X#\00\00\00MK#X#\00\00\00LK#h#K#X#\00\00\00NKX#\00\00\00PK#X#\00\00\00SK#X#\00\00\00RK#h!K#X#\00\00\00TK#X#\00\00\00WK#X#\00\00\00VK
h#K#X#\00\00\00ZK#h#K#h#K#h"K$h#K#h#K h#K!h#K#uubU#termq(K#U
leftInventoryq)(h#oq*}q+(h#]q,(h
.
.
.
.
.
.

I tried down loading from a few mirror sites but every time, the same problem and error message was displayed.

I am on UBUNTU Desk Top OS.

Please help me to get the Speech to Text (Dictation Software down loaded along with the Acoustic and Language Data Models.

Thanks.

H K Suhas
hosursuhas@gmail.com

Hello,

I tried down loading the file from the url:
http://files.kde.org/accessibility/Simon/lm/cmudict.languageProfile.mirr...

But after down loading I got the following message and the data was junk.

File information
Filename: cmudict.languageProfile
Size: 15M (16249587 bytes)
Last modified: Mon, 08 Sep 2014 09:57:57 GMT (Unix time: 1410170277)
SHA-256 Hash: ae4e7455e0e8a542a2ef8b947f70fed73b7c44be9e3dc73d05137b551779aee0
SHA-1 Hash: 8efb910677a3ad7016b65bec2bf26ce3c6be122d
MD5 Hash: 5e99a7b1f98f138320287d0852400322

Download file

There was a problem opening the file /tmp/cmudict.languageProfile.
The file you opened has some invalid characters. If you continue editing this file you could corrupt this document.
You can also choose another character encoding and try again.

The data looks as follows:

€#csequitur
Model
q#)q#}q#(U#sequiturq#(csequitur
Sequitur
q#o}q#(U#rightInventoryq#(csymbols
SymbolInventory
q#oq }q
(U#listq#]q (U#__void__q
U#__term__q#X#\00\00\00KX#\00\00\00LX#\00\00\00OWq#X#\00\00\00ZX#\00\00\00WX#\00\00\00TX#\00\00\00DX#\00\00\00AHq#X#\00\00\00BX#\00\00\00EHq#X#\00\00\00NX#\00\00\00VX#\00\00\00IHq#X#\00\00\00SX#\00\00\00SHq#X#\00\00\00AAq#X#\00\00\00RX#\00\00\00PX#\00\00\00AYq#X#\00\00\00ERq#X#\00\00\00AEq#X#\00\00\00MX#\00\00\00AOq#X#\00\00\00NGq#X#\00\00\00GX#\00\00\00THq#X#\00\00\00IYq#X#\00\00\00FX#\00\00\00DHq#X#\00\00\00HHq#X#\00\00\00UHq#X#\00\00\00OYq#X#\00\00\00CHq X#\00\00\00EYq!X#\00\00\00UWq"X#\00\00\00AWq#X#\00\00\00JHq$X#\00\00\00YX#\00\00\00ZHq%eU#dirq&}q'(h#K#h#K%h#K#X#\00\00\00YK'h#K#h K"h$K&h%K(h#K#h#K#h#K#h#K#h#K#X#\00\00\00BK
h#K#X#\00\00\00DK#X#\00\00\00GK#X#\00\00\00FK#h#K X#\00\00\00KK#X#\00\00\00MK#X#\00\00\00LK#h#K#X#\00\00\00NKX#\00\00\00PK#X#\00\00\00SK#X#\00\00\00RK#h!K#X#\00\00\00TK#X#\00\00\00WK#X#\00\00\00VK
h#K#X#\00\00\00ZK#h#K#h#K#h"K$h#K#h#K h#K!h#K#uubU#termq(K#U
leftInventoryq)(h#oq*}q+(h#]q,(h
.
.
.
.
.
.

I tried down loading from a few mirror sites but every time, the same problem and error message was displayed.

I am on UBUNTU Desk Top OS.

Please help me to get the Speech to Text (Dictation Software down loaded along with the Acoustic and Language Data Models.

Thanks.

H K Suhas
hosursuhas@gmail.com

Peter Grasch's picture

Hey Suhas,

to load the language profile go to Settings > Configure Simon > Speech Model > Language Profile > Load.
Please keep in mind that dictation is not yet supported.

Best regards,
Peter

Hey Peter,

Thanks a lot for the help.

But Simon Listens and SPHINX can also be used for Speech to Text use and not just Text to Speech.

Thanks and Regards.

Suhas

Peter Grasch's picture

We primarily do Speech to Text, not the other way around, yes.

http://nefteresurs.ru Не та мать http://j-central.net что родила http://izlechim.com а та http://diana69.com что выходила. http://pvc-market.ru

http://zdorovia-ua.info От черта крестом http://vamporada.info от свиньи пестом http://igrokam.info а от лихой жены ни пестом http://remontuemo.info ни крестом. http://sort-vinograd.com

http://vse-dly-sada.ru Be swift to hear http://ipkrim.ru slow to speak. Шибко слушай http://world-hoztorg.ru да не шибко говори. Ср. Побольше слушай http://top-hoztovar.ru поменьше говори. http://stroymat-msk.ru

http://vegetab.ru Cheapest is the dearest. Дешевое дороже всего обходится. Ср. Дешевое доводит до дорогого. Дешево http://stroitel25.ru да гнило http://sib-tek.ru дорого http://remautogroup.ru да мило. Дешевой покупке не радуйся. http://pluspodolsk.ru

http://stroymaterial-spb.ru Береги одежду http://ipmoskva.ru пока нова http://chukot-stroy.ru а здоровье http://top-tourism.ru пока молод. http://cms-zone.net

Add new comment

Filtered HTML

  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <blockquote> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.