Thank you to the following supporters of ICASSP 2007




Ciprian Chelba, Google and T. J. Hazen, MIT Computer Science and Artificial Intelligence Lab.
Ever increasing computing power and connectivity bandwidth together with falling storage costs result in an overwhelming amount of data of various types being produced, exchanged, and stored. Consequently, search emerges as a key application as more and more data is being saved.
Speech search has not received much attention due to the fact that large collections of untranscribed spoken material have not been available, mostly due to storage constraints. As storage becomes cheaper, the availability and usefulness of large collections of spoken documents is limited strictly by the lack of adequate technology to exploit them. Manually transcribing speech is expensive and sometimes outright impossible due to privacy concerns. This leads us to exploring an automatic approach to searching and navigating spoken document collections.
This tutorial will present an overview of speech transcription, indexing, and search technologies for spoken documents, with an emphasis on a corpus containing recorded academic lectures. The tutorial will point out general problems in this area and suggest possible solutions. Included in the tutorial will be a discussion of scenarios and previous projects in the area of spoken document retrieval, issues of automatic transcription of long audio files (including vocabulary coverage and the out-of-vocabulary problem, and adaptation of speech recognition models), and techniques for the indexing and retrieval of spoken audio files (including a review of text retrieval techniques, automatic speech recognition lattice techniques, query processing, relevance scoring and evaluation). Time permitting a discussion of user interface issues for browsing audio files will also be included.
An outline of the tutorial is as follows:
Ciprian Chelba will present 1.A-B, and 3.A-E.
T. J. Hazen will present 1.C, 2.A-E, 3.F, and 4.
The material presented is a basic tutorial, trying to balance a broad overview of the area with actual results form the authors' own work on the MIT iCampus lecture corpus.
Basic knowledge about probability theory and statistical modeling for speech and language should be sufficient for successfuly attending.
Ciprian Chelba is a Research Scientist with Google. Previously he worked as a Researcher in the Speech Technology Group at Microsoft Research.
His core research interests are in statistical modeling of natural language and speech. Recent projects include speech content indexing for search in spoken documents, discriminative language modeling for large vocabulary speech recognition, as well as speech and text classification.
Timothy J. Hazen is a Research Scientist at the MIT Computer Science and Artificial Intelligence Laboratory where he works in the areas of automatic speech recognition, automatic person identification, multi-modal speech processing, and conversational speech systems. For the last two years he has been a key contributor to the MIT Spoken Lecture Processing Project.