book

Techniques for Noise Robustness in Automatic Speech Recognition

by Rita Singh, Tuomas Virtanen, Bhiksha Raj

November 2012

Intermediate to advanced

514 pages

17h 40m

English

Wiley

Read now

Unlock full access

1.1 Scope of the Book1.2 Outline1.3 Notation
2.1 Introduction2.2 Speech Recognition Viewed as Bayes Classification2.3 Hidden Markov Models2.4 HMM-Based Speech Recognition
3.1 Errors in Bayes Classification3.2 Bayes Classification and ASR3.3 External Influences on Speech Recordings3.4 The Effect of External Influences on Recognition3.5 Improving Recognition under Adverse Conditions

4.1 Introduction4.2 Signal Analysis and Synthesis4.3 Voice Activity Detection4.4 Noise Power Spectrum Estimation4.5 Adaptive Filters for Signal Enhancement4.6 ASR Performance4.7 Conclusions
5.1 The Problem with Mixtures5.2 Multichannel Mixtures5.3 Single-Channel Mixtures5.4 Variations and Extensions5.5 Conclusions
6.1 Speaker Tracking6.2 Conventional Microphone Arrays6.3 Conventional Adaptive Beamforming Algorithms6.4 Spherical Microphone Arrays6.5 Spherical Adaptive Algorithms6.6 Comparative Studies6.7 Comparison of Linear and Spherical Arrays for DSR6.8 Conclusions and Further Reading
7.1 Introduction7.2 The Speech Signal7.3 Spectral Processing7.4 Cepstral Processing7.5 Influence of Distortions on Different Speech Features7.6 Summary and Further Reading
8.1 Introduction8.2 Some Attributes of Auditory Physiology and Perception8.3 “Classic” Auditory Representations8.4 Current Trends in Auditory Feature Analysis8.5 SummaryAcknowledgments
9.1 Life in an Ideal World9.2 MMSE-SPLICE9.3 Discriminative SPLICE9.4 Model-Based Feature Enhancement9.5 Switching Linear Dynamic System9.6 Conclusion
10.1 Introduction10.2 The Effect of Reverberation10.3 Approaches to Reverberant Speech Recognition10.4 Feature Domain Model of the Acoustic Impulse Response10.5 Bayesian Feature Enhancement10.6 Experimental Results10.7 ConclusionsAcknowledgment
11.1 Introduction11.2 Acoustic Model Adaptation and Noise Robustness11.3 Maximum A Posteriori Reestimation11.4 Maximum Likelihood Linear Regression11.5 Discriminative Training11.6 Conclusion
12.1 Introduction12.2 The Model-Based Approach12.3 Signal Feature Domains12.4 Interaction Models12.5 Inference Methods12.6 Efficient Likelihood Evaluation in Factorial Models12.7 Current Directions
13.1 Introduction13.2 Traditional Training Methods for Robust Speech Recognition13.3 A Brief Overview of Speaker Adaptive Training13.4 Feature-Space Noise Adaptive Training13.5 Model-Space Noise Adaptive Training13.6 Noise Adaptive Training using VTS Adaptation13.7 Discussion13.8 Conclusion
14.1 Introduction14.2 Classification with Incomplete Data14.3 Energetic Masking14.4 Meta-Missing Data: Dealing with Mask Uncertainty14.5 Some Perspectives on Performance
15.1 Introduction15.2 Missing-Data Techniques15.3 Correlation-Based Imputation15.4 Cluster-Based Imputation15.5 Class-Conditioned Imputation15.6 Sparse Imputation15.7 Other Feature-Reconstruction Methods15.8 Experimental Results15.9 Discussion and ConclusionAcknowledgments
16.1 Introduction16.2 Auditory Scene Analysis16.3 Computational Auditory Scene Analysis16.4 CASA Strategies16.5 Integrating CASA with ASR16.6 Concluding RemarksAcknowledgment
17.1 Introduction17.2 Observation Uncertainty17.3 Uncertainty Decoding17.4 Feature-Based Uncertainty Decoding17.5 Model-Based Joint Uncertainty Decoding17.6 Noisy CMLLR17.7 Uncertainty and Adaptive Training17.8 In Combination with Other Techniques17.9 Conclusions

Content preview from Techniques for Noise Robustness in Automatic Speech Recognition

The Problem of Robustness in Automatic Speech Recognition

Bhiksha Raj1, Tuomas Virtanen2, Rita Singh1

1Carnegie Mellon University, USA 2Tampere University of Technology, Finland

This chapter deals primarily not with what makes automatic speech-recognition systems (ASRs) work, but with some of the factors that make them go wrong. As mentioned earlier in Section 1.1, ASR systems often make errors in conditions in which a human listener could continue to hold a conversation effortlessly. Most real-life situations where people converse with one another or with an automated system are fraught with acoustic adversity. The speech that is finally heard may be distorted by a variety of external influences, not related to what was spoken, which affect its characteristics. While humans are not affected by them, ASR systems can be highly sensitive to these distortions. In other words, ASR systems are not robust to distortions in the speech signal in the manner that humans are. In this chapter, we discuss some of the reasons for this lack of robustness.

We recall that the problem of automatic speech recognition is fundamentally one of Bayesian classification. Recognition errors in ASR systems are a consequence of misclassification. Therefore, we begin by briefly discussing the rationale behind Bayesian classification and the conditions under which it can perform poorly. Later in the chapter, we relate these to the causes for errors in ASR, describe the various types of distortions that affect ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Start your free trial

Audio Source Separation and Speech Enhancement

Publisher Resources

ISBN: 9781118392669Purchase book

Techniques for Noise Robustness in Automatic Speech Recognition

by Rita Singh, Tuomas Virtanen, Bhiksha Raj

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

You might also like

Audio Source Separation and Speech Enhancement

Hidden Semi-Markov Models

Parametric Time-Frequency Domain Spatial Audio

Robust Automatic Speech Recognition

Publisher Resources

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,and much more.

You might also like

Audio Source Separation and Speech Enhancement

Hidden Semi-Markov Models

Parametric Time-Frequency Domain Spatial Audio

Robust Automatic Speech Recognition

Publisher Resources

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.