Multimodal HCI output: Facial motion, gestures and synthesized speech synchronization (CROSBI ID 40611)
Prilog u knjizi | izvorni znanstveni rad
Podaci o odgovornosti
Pandžić, Igor
engleski
Multimodal HCI output: Facial motion, gestures and synthesized speech synchronization
In this chapter we present an overview of the issues involved in generating multimodal output consisting of speech, facial motion and gestures. We start by introducing a basic audio-visual speech synthesis system that generates simple lip motion from input text using a TTS engine and an animation system. Throughout this chapter we gradually extend and improve this system first with coarticulation, then full facial motion and gestures and finally we present it in the context of a full Embodied Conversational Agent system. At each level we present key concepts and discuss existing systems. We concentrate on real-time interactive systems, as necessary for HCI. This requires on-the-fly generation of speech and animation and their synchronization, and does not allow for any time-consuming pre-processing. We discuss the practical issues that this requirement brings in the final section that deals with obtaining timing information from the TTS engine. We concentrate on systems that deal with plain text input (ASCII or UNICODE) rather than those that require manual tagging of text because such systems add a significant overhead to the implementation of any HCI application.
character animation, multimodal speech synthesis
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
nije evidentirano
Podaci o prilogu
257-274.
objavljeno
Podaci o knjizi
Multimodal Signal Processing
Thiran, Jean-Philippe ; Marques, Ferran ; Bourlard, Herve
Oxford: Academic Press
2010.
978-0-12-374825-6