An Integrated Deep Learning Approach to Acoustic Signal Pre-processing and Acoustic Modeling with Applications to Robust Automatic Speech Recognition
School of ECE, Georgia Tech, USA
We cast the classical speech processing problem into a new nonlinear regression setting by mapping log power spectral features of noisy to clean speech based on deep neural networks (DNNs). DNN-enhanced speech obtained by the proposed approach demonstrates better speech quality and intelligibility than those obtained with conventional state-of-the-art algorithms. Furthermore, this new paradigm also facilitates an integrated deep learning framework to train the three key modules in an automatic speech recognition (ASR) system, namely signal conditioning, feature extraction and acoustic phone models, altogether in a unified manner. The proposed framework was tested on recent challenging ASR tasks in CHiME-2, CHiME-4 and REVERB, which are designed to evaluate ASR robustness in mixed speakers, multi-channel, and reverberant conditions. Leveraging upon this new approach, our team scored the lowest word error rates in all three tasks with acoustic pre-processing algorithms for speech separation, microphone array based speech enhancement and speech dereverberation.
Chin-Hui Lee is a professor at School of Electrical and Computer Engineering, Georgia Institute of Technology. Before joining academia in 2001, he had accumulated 20 years of industrial experience ending in Bell Laboratories, Murray Hill, as a Distinguished Member of Technical Staff and Director of the Dialogue Systems Research Department. Dr. Lee is a Fellow of the IEEE and a Fellow of ISCA. He has published over 450 papers and 30 patents, with more than 30,000 citations and an h-index of 70 on Google Scholar. He received numerous awards, including the Bell Labs President’s Gold Award in 1998. He won the SPS’s 2006 Technical Achievement Award for “Exceptional Contributions to the Field of Automatic Speech Recognition”. In 2012 he gave an ICASSP plenary talk on the future of automatic speech recognition. In the same year he was awarded the ISCA Medal in scientific achievement for “pioneering and seminal contributions to the principles and practice of automatic speech and speaker recognition”.
Audio Equalization and Reverberation
Aalto University, Finland
This talk will review advances in two audio signal processing topics, equalization and artificial reverberation, which are needed in augmented and virtual reality audio. The graphic equalizer is a standard tool in music and audio production, which allows the free adjustment of the gain at several frequency bands. The control of the gains can be manual or automatic, depending of the application. The underlying signal processing structure is either a parallel or a cascade IIR filter. In the past few years, we have learned, at last, how to accurately design such filters. Example applications of automatic audio equalization will be discussed in this talk. Artificial reverberation has a long history, but new exciting ideas are introduced continuously. Whereas a large proportion of artificial reverberation research has focused on the imitation of concert hall acoustics, the modeling of outdoor acoustic environments has become important for gaming, virtual reality, and simulation of noise propagation. The use of velvet noise, a sparse pseudo-random sequence, will be described for creating computationally efficient reverberation effects.
Prof. Vesa Välimäki is the Vice Dean for research at the Aalto University School of Electrical Engineering, Espoo, Finland. He is a Full Professor of audio signal processing at Aalto University. He received the Master of Science in Technology and the Doctor of Science in Technology degrees, both in electrical engineering, from the Helsinki University of Technology, Espoo, Finland, in 1992 and 1995, respectively. In 1996, he was a Postdoctoral Research Fellow at the University of Westminster, London, UK. In 2008-2009, he was a Visiting Scholar at the Center for Computer Research in Music and Acoustics (CCRMA), Stanford University, Stanford, CA, USA. He is a Fellow of the AES (Audio Engineering Society), a Fellow of the IEEE, and a Life Member of the Acoustical Society of Finland. He is a Senior Area Editor of the IEEE/ACM Transactions on Audio, Speech, and Language Processing. In 2016, he was the Guest Editor of the special issue of Applied Sciences on audio signal processing. He was the Chairman of the International Conference on Digital Audio Effects, DAFx-08, in 2008, and was the Chairman of the Sound and Music Computing Conference, SMC-17, in 2017.
Why Deep Learning Networks Work So Well?
C.-C. Jay Kuo
University of Southern California, USA
The convolution neural network (CNN) provides a powerful tool for image and video processing and understanding nowadays. In this talk, I will build a bridge between the traditional single-layer signal representation methods such as the Fourier, wavelet and sparse representation and the modern multi-layer signal analysis approach based on CNNs. To begin with, I introduce a RECOS transform as a basic building block of CNNs, where “RECOS” is an acronym for “REctified-COrrelations on a Sphere”. It consists of two main concepts: data clustering on a sphere and rectification. Then, I interpret a CNN as a network that implements the guided multi-layer RECOS transform. Besides offering a full explanation to the operating principle of CNNs, I discuss how guidance is provided by labels through backpropagation (BP) in the training and show that CNNs can give a full spectrum of learning paradigms – from unsupervised, weakly supervised to fully supervised learning.
Dr. C.-C. Jay Kuo received his Ph.D. degree from the Massachusetts Institute of Technology in 1987. He is now with the University of Southern California (USC) as Director of the Media Communications Laboratory and Dean’s Professor in Electrical Engineering-Systems. His research interests are in the areas of digital media processing, compression, communication and networking technologies. Dr. Kuo was the Editor-in-Chief for the IEEE Trans. on Information Forensics and Security in 2012-2014. He was the Editor-in-Chief for the Journal of Visual Communication and Image Representation in 1997-2011, and served as Editor for 10 other international journals. Dr. Kuo received the 1992 National Science Foundation Young Investigator (NYI) Award, the 1993 National Science Foundation Presidential Faculty Fellow (PFF) Award, the 2010 Electronic Imaging Scientist of the Year Award, the 2010-11 Fulbright-Nokia Distinguished Chair in Information and Communications Technologies, the 2011 Pan Wen-Yuan Outstanding Research Award, the 2014 USC Northrop Grumman Excellence in Teaching Award, the 2016 USC Associates Award for Excellence in Teaching, the 2016 IEEE Computer Society Taylor L. Booth Education Award, the 2016 IEEE Circuits and Systems Society John Choma Education Award, the 2016 IS&T Raymond C. Bowman Award, and the 2017 IEEE Leon K. Kirchmayer Graduate Teaching Award. Dr. Kuo is a Fellow of AAAS, IEEE and SPIE. He has guided 140 students to their Ph.D. degrees and supervised 25 postdoctoral research fellows. Dr. Kuo is a co-author of about 250 journal papers, 900 conference papers and 14 books.
Smart Radios for Smart Life
K. J. Ray Liu
University of Maryland, College Park, USA
What smart impact will future 5G and IoT bring to our lives? Many may wonder, and even speculate, but do we really know? With more and more bandwidth readily available for the next generation of wireless applications, many more smart applications/services unimaginable today may be possible. In this talk, we will show that with more bandwidth, one can see many multi-paths, which can serve as hundreds of virtual antennas that can be leveraged as new degrees of freedom for smart life. Together with the fundamental physical principle of time reversal to focus energy to some specific positions, or more generally by employing waveforming, a revolutionary smart radio platform can be built to enable many cutting-edge IoT applications that have been envisioned for a long time, but have never been achieved. We will show the world’s first ever centimeter-accuracy wireless indoor positioning systems that can offer indoor GPS-like capability to track human or any indoor objects without any infrastructure, as long as WiFi or LTE is available. Such a technology forms the core of a smart radios platform that can be applied to home/office monitoring/security, radio human biometrics, vital signs detection, wireless charging, and 5G communications. In essence, in the future of wireless world, communication will be just a small component of what’s possible. There are many more magic-like smart applications that can be made possible, such as to answer questions like how many people are next door, who they are, and what they are doing without any sensors deployed next door. Some demo videos will be shown to illustrate the future of smart radios for smart life.
Dr. K. J. Ray Liu is the founder of Origin Wireless, Inc., a high-tech start-up developing smart radios for smart life. He was named a Distinguished Scholar-Teacher of University of Maryland, College Park, in 2007, where he is Christine Kim Eminent Professor of Information Technology. Dr. Liu was a recipient of the 2016 IEEE Leon K. Kirchmayer Technical Field Award on graduate teaching and mentoring, IEEE Signal Processing Society 2014 Society Award, for “influential technical contributions and profound leadership impact”, IEEE Signal Processing Society 2009 Technical Achievement Award, and more than a dozen best paper awards. Recognized by Thomson Reuters as a Highly Cited Researcher, he is a Fellow of IEEE and AAAS. Dr. Liu is a member of IEEE Board of Director. He was President of IEEE Signal Processing Society, where he has served as Vice President – Publications and Editor-in-Chief of IEEE Signal Processing Magazine. He also received teaching and research recognitions from University of Maryland including university-level Invention of the Year Award (three times), and college-level Poole and Kent Senior Faculty Teaching Award, Outstanding Faculty Research Award, and Outstanding Faculty Service Award, all from A. James Clark School of Engineering (each award given once per year from the entire college).