8

Open Source Dictation: Wrapping up

Ten days ago, I completed the dictation prototype – just in time for this years Akademy conference.

Akademy

At Akademy, I gave a talk about open source speech recognition and demoed the dictation prototype.
The slides and the video of the talk are both already available. If you’ve seen the talk, please consider leaving me some feedback – it’s always appreciated.

On Tuesday, I held a 2 hour BoF session dedicated to open source speech recognition: First, I quickly re-established the core pillars of an LVCSR system and explained how all those components fit together. Then we talked about where one could potentially source additional material for building higher quality language and acoustic models and discussed some applications of speech recognition technology relevant to the larger KDE community.

As a side note: This years Akademy was certainly one of the best conferences I’ve been to thus far. The talks and BoF sessions were great, the atmosphere inspired and the people – as always – just awesome. A special thanks also to the local team and all the organizers which put together a program that was simply sublime.

Where’s the code?

When I started, I told you that I’ll share all data created during the course of this project. As promised:

(I decided to share the unadapted acoustic model instead of the final, adapted one I used in the video because the latter is specifically tailored to my own voice and I suppose that is not really useful for anyone but me. If you’re really interested in the adapted model for the sake of reproducability, I’m of course also happy to share this model as well.)

As I mentioned repeatedly, this is “just” a prototype and absolutely not intended for end-user consumption. Even with all the necessary data files, setting up a working system is anything but trivial. If you’re looking for a ready-to-use system – and I can’t stress this enough: Simon is not (yet) it!

Where to go from here?

As many of you will have noticed, the project was partly also intended to find potentially interested contributers to join me in building open source speech recognition systems. In this regard, I’m happy to report that in the last 10 days, quite a few people contacted me and asked how to get involved.

I’ll hold an IRC meeting in the coming week to discuss possible tasks and how to get started. If you’re interested in joining the meeting, please get in touch.

facebooktwittergoogle_plusredditpinterestlinkedinmailby feather

Peter Grasch

8 Comments

  1. Very Impressive Project

    I have been following your posts with great interest. I am exploring the opportunities with the speech recognition in Indian accent English. I am using pocketsphinx for the experimets. My experiments gives me the WER which is poor compared to Google Speech API even for the small vocabulary. I have used FSG which contains aroud 500 words as grammar. My initial guess is that acoustic model plays most important role in overall recognition. However, I have no idea on what it takes to adapt the acoustic model for general purpose use. Can you suggest how much time and effort is required in adaptation.
    Thanks Peter. Your blogs are very informative and I learnt many new things from it.

  2. Wondering if we’ll be seeing this soon?

    I’ve looking forward to the summer release of the dictation!

  3. Unable to down load the Application and Data files of Di9ctation

    Hello,

    I tried down loading the file from the url:
    http://files.kde.org/accessibility/Simon/lm/cmudict.languageProfile.mirr

    But after down loading I got the following message and the data was junk.

    File information
    Filename: cmudict.languageProfile
    Size: 15M (16249587 bytes)
    Last modified: Mon, 08 Sep 2014 09:57:57 GMT (Unix time: 1410170277)
    SHA-256 Hash: ae4e7455e0e8a542a2ef8b947f70fed73b7c44be9e3dc73d05137b551779aee0
    SHA-1 Hash: 8efb910677a3ad7016b65bec2bf26ce3c6be122d
    MD5 Hash: 5e99a7b1f98f138320287d0852400322

    Download file

    There was a problem opening the file /tmp/cmudict.languageProfile.
    The file you opened has some invalid characters. If you continue editing this file you could corrupt this document.
    You can also choose another character encoding and try again.

    The data looks as follows:

    €#csequitur
    Model
    q#)q#}q#(U#sequiturq#(csequitur
    Sequitur
    q#o}q#(U#rightInventoryq#(csymbols
    SymbolInventory
    q#oq }q
    (U#listq#]q (U#__void__q
    U#__term__q#X#\00\00\00KX#\00\00\00LX#\00\00\00OWq#X#\00\00\00ZX#\00\00\00WX#\00\00\00TX#\00\00\00DX#\00\00\00AHq#X#\00\00\00BX#\00\00\00EHq#X#\00\00\00NX#\00\00\00VX#\00\00\00IHq#X#\00\00\00SX#\00\00\00SHq#X#\00\00\00AAq#X#\00\00\00RX#\00\00\00PX#\00\00\00AYq#X#\00\00\00ERq#X#\00\00\00AEq#X#\00\00\00MX#\00\00\00AOq#X#\00\00\00NGq#X#\00\00\00GX#\00\00\00THq#X#\00\00\00IYq#X#\00\00\00FX#\00\00\00DHq#X#\00\00\00HHq#X#\00\00\00UHq#X#\00\00\00OYq#X#\00\00\00CHq X#\00\00\00EYq!X#\00\00\00UWq”X#\00\00\00AWq#X#\00\00\00JHq$X#\00\00\00YX#\00\00\00ZHq%eU#dirq&}q'(h#K#h#K%h#K#X#\00\00\00YK’h#K#h K”h$K&h%K(h#K#h#K#h#K#h#K#h#K#X#\00\00\00BK
    h#K#X#\00\00\00DK#X#\00\00\00GK#X#\00\00\00FK#h#K X#\00\00\00KK#X#\00\00\00MK#X#\00\00\00LK#h#K#X#\00\00\00NKX#\00\00\00PK#X#\00\00\00SK#X#\00\00\00RK#h!K#X#\00\00\00TK#X#\00\00\00WK#X#\00\00\00VK
    h#K#X#\00\00\00ZK#h#K#h#K#h”K$h#K#h#K h#K!h#K#uubU#termq(K#U
    leftInventoryq)(h#oq*}q+(h#]q,(h
    .
    .
    .
    .
    .
    .

    I tried down loading from a few mirror sites but every time, the same problem and error message was displayed.

    I am on UBUNTU Desk Top OS.

    Please help me to get the Speech to Text (Dictation Software down loaded along with the Acoustic and Language Data Models.

    Thanks.

    H K Suhas

  4. Hey Suhas,

    to load the language profile go to Settings > Configure Simon > Speech Model > Language Profile > Load.
    Please keep in mind that dictation is not yet supported.

    Best regards,
    Peter

  5. Hey Peter,

    Thanks a lot for the help.

    But Simon Listens and SPHINX can also be used for Speech to Text use and not just Text to Speech.

    Thanks and Regards.

    Suhas

    • We primarily do Speech to

      We primarily do Speech to Text, not the other way around, yes.

  6. want some money?

    Maybe some money will help to motivate dictation’s inclusion in the distribution of the simon software, here is a small offer on freedomsponsors that others may want to add to as a token to be claimed on completion of the effort:

    https://freedomsponsors.org/issue/650/simon

    I am new to the freedom sponsors thing, I believe you will need to request payment there. Thanks again for your efforts.

Leave a Reply

Your email address will not be published. Required fields are marked *