book

Robust Automatic Speech Recognition

by Jinyu Li, Li Deng, Reinhold Haeb-Umbach, Yifan Gong

October 2015

Intermediate to advanced

306 pages

10h 38m

English

Academic Press

Read now

Unlock full access

Abstract1.1 Automatic Speech Recognition1.2 Robustness to Noisy Environments1.3 Existing Surveys in the Area1.4 Book Structure Overview

Abstract2.1 Introduction: Components of Speech Recognition2.2 Gaussian Mixture Models2.3 Hidden Markov Models and the Variants2.4 Deep Learning and Deep Neural Networks2.5 Summary
Abstract3.1 Standard Evaluation Databases3.2 Modeling Distortions of Speech in Acoustic Environments3.3 Impact of Acoustic Distortion on Gaussian Modeling3.4 Impact of Acoustic Distortion on DNN Modeling3.5 A General Framework for Robust Speech Recognition3.6 Categorizing Robust ASR Techniques: An Overview3.7 Summary
Abstract4.1 Feature-Space Approaches4.2 Model-Space Approaches4.3 Summary
Abstract5.1 Learning from Stereo Data5.2 Learning from Multi-Environment Data5.3 Summary
Abstract6.1 Parallel Model Combination6.2 Vector Taylor Series6.3 Sampling-Based Methods6.4 Acoustic Factorization6.5 Summary
Abstract7.1 Model-Domain Uncertainty7.2 Feature-Domain Uncertainty7.3 Joint Uncertainty Decoding7.4 Missing-Feature Approaches7.5 Summary
Abstract8.1 Speaker Adaptive and Source Normalization Training8.2 Model Space Noise Adaptive Training8.3 Joint Training for DNN8.4 Summary
Abstract9.1 Introduction9.2 Acoustic Impulse Response9.3 A Model of Reverberated Speech in Different Domains9.4 The Effect of Reverberation on ASR Performance9.5 Linear Filtering Approaches9.6 Magnitude or Power Spectrum Enhancement9.7 Feature Domain Approaches9.8 Acoustic Model Domain Approaches9.9 The REVERB Challenge9.10 To Probe Further9.11 Summary
Abstract10.1 Introduction10.2 The Acoustic Beamforming Problem10.3 Fundamentals of Data-Dependent Beamforming10.4 Multi-Channel Speech Recognition10.5 To Probe Further10.6 Summary
Abstract11.1 Robust Methods in the Era of GMM11.2 Robust Methods in the Era of DNN11.3 Multi-Channel Input and Robustness to Reverberation11.4 Epilogue

Content preview from Robust Automatic Speech Recognition

List of Figures

Fig. 1.1 From thoughts to speech. 3

Fig. 2.1 Illustration of the CD-DNN-HMM and its three core components. 24

Fig. 2.2 Illustration of the CNN in which the convolution is applied along frequency bands. 28

Fig. 3.1 A model of acoustic environment distortion in the discrete-time domain relating the clean speech sample x[m] to the distorted speech sample y[m]. 43

Fig. 3.2 Cepstral distribution of word oh in Aurora 2. 47

Fig. 3.3 The impact of noise, with varying mean values from 5 in (a) to 25 in (d), in the log-Mel-filter-bank domain. The clean speech has a mean value of 25 and a standard deviation of 10. The noise has a standard deviation of 2. 48

Fig. 3.4 Impact of noise with different standard deviation values in the log-Mel-filter-bank ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Start your free trial

Techniques for Noise Robustness in Automatic Speech Recognition

Rita Singh, Tuomas Virtanen, Bhiksha Raj

Intelligent Speech Signal Processing

Nilanjan Dey

Audio Source Separation and Speech Enhancement

Emmanuel Vincent, Tuomas Virtanen, Sharon Gannot

Cognitive Virtual Assistants Using Google Dialogflow: Develop Complex Cognitive Bots Using the Google Dialogflow Platform

Navin Sabharwal, Amit Agrawal

Publisher Resources

ISBN: 9780128026168