ICASSP 2007 - April 15-20, 2007 - Honolulu, Hawai'i, U.S.A.

50 years of progress in speech recognition technology -- Where we are, and where we should go

Date: Wednesday, April 18

Presented by

Sadaoki Furui, Tokyo Institute of Technology

Abstract

This talk surveys the past 50 years of automatic speech recognition (ASR) research, and suggests where we should focus our energies in the future. The history of ASR research is segmented into five periods: pre-history (before 1952), 1st generation (1952-1968), 2nd generation (1968-1980), 3rd generation (1980-1990), and the current 3.5th generation (1990-now). The transitions from one generation to the next are interestingly synchronized with advances in computer (IT) technology. The fundamentals of current state-of-the-art ASR technology were established during the 3rd generation period, during which the number of ICASSP papers on ASR rapidly increased. In 1986, for the first time, ASR papers represented a majority of all speech-related papers, and this has remained true for every year since. Starting with the transition to the 2nd generation period, the numbers of papers on ASR from the US and Japan have been consistently larger than any other countries. However, in the last 10 years, the number of papers originating from other countries, particularly China, has been significantly increasing. Although ASR technology has made remarkable progress over the last 50 years, there still exist a large number of problems that need to be solved.

Speaker Biography

Photo of Sadaoki Furui Sadaoki Furui received B.S., M.S., and Ph.D. degrees in mathematical engineering and instrumentation physics from Tokyo University, Tokyo, Japan in 1968, 1970, and 1978, respectively. He joined the Electrical Communications Laboratories of Nippon Telegraph and Telephone (NTT) Corporation in 1970, and later served as a Research Fellow and the Director of the Furui Research Laboratory at NTT Human Interface Laboratories, from 1991 to 1997. He is currently a Professor of the Department of Computer Science, Graduate School of Information Science and Engineering, Tokyo Institute of Technology. He is also a Vice Dean of the Graduate School of Information Science and Engineering. His research interests include analysis of speaker characterization information in speech waves and its application to speaker recognition as well as interspeaker normalization and adaptation in speech recognition. He is also interested in vector-quantization-based speech recognition algorithms, spectral dynamic features for speech recognition, speech recognition algorithms that are robust against noise and distortion, algorithms for Japanese large-vocabulary continuous-speech recognition, automatic speech summarization algorithms, multimodal human-computer interaction systems, automatic question-answering systems, and analysis of the speech perception mechanism. He has authored or coauthored over 700 published articles. From December 1978 to December 1979, he served on the staff of the Acoustics Research Department of Bell Laboratories, Murray Hill, New Jersey, as a visiting researcher working on speaker verification. Dr. Furui is a Fellow of the IEEE, the Acoustical Society of America and the Institute of Electronics, Information and Communication Engineers of Japan (IEICE). He served as President of the Permanent Council of International Conferences on Spoken Language Processing (PC-ICSLP) from 2000 to 2004, the International Speech Communication Association (ISCA) from 2001 to 2005, and the Acoustical Society of Japan (ASJ) from 2001 to 2003. He served on the IEEE Technical Committees on Speech as well as Multimedia Signal Processing, and the Technical Program Committees of ICASSP86 in Tokyo as well as ICSLP90 in Kobe. He served on ICSLP94 in Yokohama as Vice Chairman of the Conference Committee. He has organized various international conferences and workshops including the 1997 IEEE Workshop on Automatic Speech Recognition and Understanding. He has also served on several international advisory boards in the US and Europe. He served as a Board member of the IEEE Signal Processing Society from 2001 to 2003. He served as an Editor-in-Chief of the Journal of Speech Communication from 1997 to 2001, Chief Editor of the Journal of the ASJ from 1997 to 1999, and Chief Editor of the English Journal of IEICE from 2001 to 2003. He also served as an IEEE Press Editorial Board member from 1995 to 1999. He is now serving as an Editorial Board member of the Journal of Computer Speech and Language and the Journal of Speech Communication. He is also serving as a Board member of the IEICE. He supervised the five-year Japanese Science and Technology Agency Priority Program entitled “Spontaneous Speech: Corpus and Processing Technology” from 1999 to 2004. He has supervised the 21st Century Center of Excellence (COE) Program entitled “Framework for Systematization and Application of Large-scale Knowledge Resources” since its inception in 2003. He received the Yonezawa Prize and the Paper Award from the IEICE in 1975, 1988, 1993 and 2003, and the Sato Paper Award from the ASJ in 1985 and 1987. He received the Senior Award from the IEEE ASSP Society and the Achievement Award from the Minister of Science and Technology, both in 1989. He received the Book Award from the IEICE in 1990 and the Technical Achievement Award from the IEICE in 2003. He received the IEEE Signal Processing Society Award, the Achievement Award from the Minister of Education, Culture, Sports, Science and Technology, and the Purple Ribbon Medal from Japanese Emperor in 2006. He also received the Mira Paul Memorial Award from the AFECT, India in 2001. He was a Distinguished Lecturer of the IEEE Signal Processing Society from 1993 to 1994. He is the author of “Digital Speech Processing, Synthesis, and Recognition” (Marcel Dekker, 1989, revised in 2000) in English, “Digital Speech Processing” (Tokai University Press, 1985) in Japanese, "Acoustics and Speech Processing" (Kindai-Kagaku-Sha, 1992, revised in 2006) in Japanese, and “Speech Information Processing” (Morikita, 1998) in Japanese. He has co-authored “Image and Speech Processing Technology” (Denpa-Shinbun-Sha, 2004) in Japanese. He has edited “Advances in Speech Signal Processing” (Marcel Dekker, 1992) jointly with Dr. M.M. Sondhi. He has translated into Japanese “Fundamentals of Speech Recognition,” authored by Drs. L.R. Rabiner and B.-H. Juang (NTT Advanced Technology, 1995) and “Vector Quantization and Signal Compression,” authored by Drs. A. Gersho and R. M. Gray (Corona-sha, 1998)


©2008 Conference Management Services, Inc. -||- email: webmaster@icassp2007.com -||- Last updated Wednesday, March 14, 2007