You pick, I work: Dictation, Assistant or Translator?

A little while ago, I mentioned that I’ll be giving a talk about the current state of open source speech recognition at this years Akademy.
As part of that talk, I want to show off a tech-demo of a moonshot use case of open source speech recognition to not only demonstrate what is already possible, but also show off the limits of the current state of the art.

So a couple of days ago, I asked what application of speech recognition technology would be most interesting for you, and many of you responded. I extracted the three options that broadly cover all suggestions: Dictation (like Dragon Naturally Speaking), a virtual assistant (like Siri) and simultaneous translation (like Star Trek’s universal translator).

You now get to pick one of those three from the poll below.

After the poll closes (a week from now), I’ll take the idea that received the most votes and devote about a week to build a prototype based on currently available open source language processing tools. This prototype will then be demonstrated at this years Akademy.

Happy voting!

Poll: Dictation, Assistant or Translator?

  • Dictation (48%, 124 Votes)
  • Virtual personal assistant (36%, 94 Votes)
  • Simultaneous translation (16%, 41 Votes)

Total Voters: 259

Loading ... Loading ...
Facebooktwittergoogle_plusredditpinterestlinkedinmailby feather

Peter Grasch


    • Yes, thought about that as well. I’d actually have a couple of ideas for all the implementations in the poll.

      Then again, please keep in mind that this is about a tech-demo, not a ship-able product. That comes later.

  1. Dictation is important to all of the choices.

    If Simon implements full dictation then all of the others can follow. All ready if you can launch krunner through Simon and dictate the search you can use the KDE web shortcuts to search amazon, imdb, duck duck go and various different websites, which is almost an assistant. Dictation combined with the nepomuk natural language parser makes for even more interesting uses. And of course one would need dictation to even implement a live translator.

  2. I’m afraid this is still far from a good idea because “Google lol cats” is really not a common “sentence”. Even if I’m doing my very best (and I’m trying) to model expected sentences from a vast amount of sources, it’s unlikely that I’ll hit “Google lol cats” even once.
    That means you’ll be fighting the recognizers “intuition”.
    Try doing that with a human, if you want: Next you chat with someone about something mundane (non technical / geek culture), respond to a question with “Google lol cats”. I’d be almost certain that she / he won’t understand it without you repeating. That’s because we humans do the same thing: we expect certain responses and if there’s one that’s really far off the mark, we usually require a second take to recognize it (case in point: “yeah…. wait, what?”).

Leave a Reply

Your email address will not be published. Required fields are marked *