Over the years, I’ve been involved in many cool projects, and I can not possibly list them all on this page. However, I have summarized a few of my more public projects below.
From 2013 to 2015, I researched speech-based recommender systems at the Graz University of Technology. The final prototype of the developed system was called “SpeechRec”, a conversational natural-language knowledge-based recommender system – in other words: a virtual shopping assistant.
SpeechRec was designed to work similarly to a humanoid shopping assistant, and used extensive domain knowledge to extract nuanced information from a wide array of inputs (more than 800 key phrases). In parallel to natural language processing, which allowed SpeechRec to distinguish e.g., between “a little cheaper” and “a lot cheaper”, the system employed paralinguistic analysis to react to the speaker’s “tone” to detect priorities and further enhance conflict resolution.
Internally, SpeechRec used a modified version of Simon (see below) employing CMU’s PocketSphinx decoder to provide speech recognition, and the OpenEAR framework for paralinguistic analysis. The avatar was designed and rendered in Blender 3D (a custom avatar library was written to animate real-time conversations) and voiced using MARY TTS. Among the many tools and libraries used to develop SpeechRec were MongoDb (to store domain knowledge), Stanford NLP and OpenNLP (various pre-processing, knowledge engineering), and Heritrix (knowledge engineering).
For a long time, most of my “productive” free time was spent on Simon: an open source speech recognition solution.
While it would go beyond the scope to list the many components and features of the whole Simon suite, here are some highlights:
- Extremely flexible design, making Simon language- and dialect independent.
- Acoustic model creation and adaption.
- Available command plug-ins contain the basics of keyboard- and mouse simulation and starting programs but also a complete dialog system with voice output, an integration with KDE’s PIM layer, both JSON and D-Bus connectors to third party software and many more.
- Server / client architecture with support for client collaboration to enable multi-station, single-user installations.
- Integrated sound server supporting multi-microphone setups and multi-threaded postprocessing.
- Elaborate context layer allows to employ contextual information to improve recognition accuracy or change commands based on the current situation.
- Mobile clients for Meego and BlackBerry.
- An acoustic model generator for power users called Sam, a distributed tool for large sample acquisitions, SSC, and more.
Due to it’s modular design, Simon also lends itself to integration in other projects and has been used successfully in various research projects including an EU project investigating the use of a voice controlled caregiving robot.
For more information, please visit the Simon homepage.
I am an avid cinephile which caused me to delve into the home theater “scene” a couple of years ago. Since then, this has become a hobby in itself.
My first setup included a projector and a separate touchscreen next to the couch that acted as a remote control. Because at the time no media center would support such a setup, I wrote “Melissa”.
Melissa featured a simple, touch oriented, animated user interface, IMDB- and trailer integration and even handled encrypted volumes.
Sadly, after swapping the projector for a much more conventional large TV as part of moving to a much smaller apartment, Melissa was replaced with a slightly customized installation of XBMC and is thus no longer maintained.
If you are interested, the code is available on request.