Thank you to the following supporters of ICASSP 2007




Michael Johnston, AT&T Labs Research and Srinivas Bangalore, AT&T Labs Research
The ongoing convergence of the web with telephony, through technologies such as Voice over IP, high-speed mobile data networks, and handheld computers and smartphones, enables the creation of natural and highly effective multimodal interfaces for human-human communication and human-machine interaction with automated services. These interfaces allow for user input and system output to be optimally distributed over multiple different modes such as speech, pen, and graphical displays. Research on the computational processing and generation of language has primarily focussed on linear sequences of speech or text where the primitive elements are phonemes, morphemes, or words. Multimodal language can be distributed over two or three spatial dimensions as well as the temporal dimension and involve additional primitive elements such as gestures, drawings, tables, and charts. This tutorial provides an overview of the problem of multimodal language processing and detailed examples showing how representations and techniques from speech, language, and dialog processing can be extended and applied to the parsing, integration, understanding of multimodal inputs and the planning, generation, and presentation of multimodal outputs.
This tutorial is intended for students, researchers, and practioners in speech, language, and dialog processing who want to see how many of the techniques developed within the community can be applied to the creation of real-world multimodal interactive systems. It is introductory in nature and no special knowledge or background is required.
Michael Johnston is a Senior Technical Specialist in the IP
and Voice-enabled services research lab of AT&T Labs - Research. His
research interests span natural language processing, spoken and multimodal
interactive systems, and human-computer interaction. For the last ten years,
his work has focussed on the extension of language and dialog processing
technologies to support multimodal interaction. In 1999, Dr. Johnston was
awarded an NSF CAREER award for research on multimodal language processing for
natural interfaces. He is also active in the creation of standards supporting
spoken and multimodal interface development and serves as editor-in-chief of
the World Wide Web consortium EMMA: Extensible Multimodal Annotation
specification. Dr. Johnston is a member of the IEEE Speech and Language
technical committee (2006-2008), was an area chair for ACL 2004, and has served
as a program committee member and reviewer for numerous international
conferences, journals, and workshops.
Srinivas Bangalore is a Senior Technical Specialist in the IP and Voice-enabled services research lab of AT&T Labs - Research. His research areas include speech and language processing topics related to parsing, machine translation, multimodal integration, and finite-state methods. His dissertation was on a robust parsing approach called Supertagging that combines the strengths of statistical and linguistic models of language processing. During the past ten years, some of the topics he has worked on include tightly coupling speech recognition and language translation using finite-state speech translation approaches, supertag-based surface realizer for natural language generation, and finite-state based multimodal integration and understanding. Dr. Bangalore has been on the editorial board of Computational Linguistics Journal (2001-2003), the workshop chair for ACL 2004, member of IEEE Speech Technical Committee (2006-2008) and has served as a program committee member for a number of ACL and IEEE conferences and workshops.