SpeeG: a multimodal speech-and gesture-based text input solution

Lode Hoste, Bruno Dumas, Beat Signer

Research output: Contribution in Book/Catalog/Report/Conference proceedingConference contribution

18 Downloads (Pure)


We present SpeeG, a multimodal speech- and body gesture-based text input system targeting media centres, set-top boxes and game consoles. Our controller-free zoomable user interface combines speech input with a gesture-based real-time correction of the recog- nised voice input. While the open sourceCMUSphinx voice recog- niser transforms speech input into written text, Microsoft’s Kinect sensor is used for the hand gesture tracking. A modified version of the zoomable Dasher interface combines the input from Sphinx and the Kinect sensor. In contrast to existing speech error correction so- lutions with a clear distinction between a detection and correction phase, our innovative SpeeG text input system enables continuous real-time error correction. An evaluation of the SpeeG prototype has revealed that low error rates for a text input speed of about six words per minute can be achieved after a minimal learning phase. Moreover, in a user study SpeeG has been perceived as the fastest of all evaluated user interfaces and therefore represents a promising candidate for future controller-free text input
Original languageEnglish
Title of host publicationProceedings of the International Working Conference on Advanced Visual Interfaces (AVI '12)
PublisherACM Press
Number of pages8
ISBN (Print)9781450312875
Publication statusPublished - 2012
Externally publishedYes


  • speech recognition
  • kinect
  • multimodal text input
  • gesture input


Dive into the research topics of 'SpeeG: a multimodal speech-and gesture-based text input solution'. Together they form a unique fingerprint.

Cite this