book

Techniques for Noise Robustness in Automatic Speech Recognition

by Rita Singh, Tuomas Virtanen, Bhiksha Raj

November 2012

Intermediate to advanced

514 pages

17h 40m

English

Wiley

Read now

Unlock full access

1.1 Scope of the Book1.2 Outline1.3 Notation
2.1 Introduction2.2 Speech Recognition Viewed as Bayes Classification2.3 Hidden Markov Models2.4 HMM-Based Speech Recognition
3.1 Errors in Bayes Classification3.2 Bayes Classification and ASR3.3 External Influences on Speech Recordings3.4 The Effect of External Influences on Recognition3.5 Improving Recognition under Adverse Conditions

4.1 Introduction4.2 Signal Analysis and Synthesis4.3 Voice Activity Detection4.4 Noise Power Spectrum Estimation4.5 Adaptive Filters for Signal Enhancement4.6 ASR Performance4.7 Conclusions
5.1 The Problem with Mixtures5.2 Multichannel Mixtures5.3 Single-Channel Mixtures5.4 Variations and Extensions5.5 Conclusions
6.1 Speaker Tracking6.2 Conventional Microphone Arrays6.3 Conventional Adaptive Beamforming Algorithms6.4 Spherical Microphone Arrays6.5 Spherical Adaptive Algorithms6.6 Comparative Studies6.7 Comparison of Linear and Spherical Arrays for DSR6.8 Conclusions and Further Reading
7.1 Introduction7.2 The Speech Signal7.3 Spectral Processing7.4 Cepstral Processing7.5 Influence of Distortions on Different Speech Features7.6 Summary and Further Reading
8.1 Introduction8.2 Some Attributes of Auditory Physiology and Perception8.3 “Classic” Auditory Representations8.4 Current Trends in Auditory Feature Analysis8.5 SummaryAcknowledgments
9.1 Life in an Ideal World9.2 MMSE-SPLICE9.3 Discriminative SPLICE9.4 Model-Based Feature Enhancement9.5 Switching Linear Dynamic System9.6 Conclusion
10.1 Introduction10.2 The Effect of Reverberation10.3 Approaches to Reverberant Speech Recognition10.4 Feature Domain Model of the Acoustic Impulse Response10.5 Bayesian Feature Enhancement10.6 Experimental Results10.7 ConclusionsAcknowledgment
11.1 Introduction11.2 Acoustic Model Adaptation and Noise Robustness11.3 Maximum A Posteriori Reestimation11.4 Maximum Likelihood Linear Regression11.5 Discriminative Training11.6 Conclusion
12.1 Introduction12.2 The Model-Based Approach12.3 Signal Feature Domains12.4 Interaction Models12.5 Inference Methods12.6 Efficient Likelihood Evaluation in Factorial Models12.7 Current Directions
13.1 Introduction13.2 Traditional Training Methods for Robust Speech Recognition13.3 A Brief Overview of Speaker Adaptive Training13.4 Feature-Space Noise Adaptive Training13.5 Model-Space Noise Adaptive Training13.6 Noise Adaptive Training using VTS Adaptation13.7 Discussion13.8 Conclusion
14.1 Introduction14.2 Classification with Incomplete Data14.3 Energetic Masking14.4 Meta-Missing Data: Dealing with Mask Uncertainty14.5 Some Perspectives on Performance
15.1 Introduction15.2 Missing-Data Techniques15.3 Correlation-Based Imputation15.4 Cluster-Based Imputation15.5 Class-Conditioned Imputation15.6 Sparse Imputation15.7 Other Feature-Reconstruction Methods15.8 Experimental Results15.9 Discussion and ConclusionAcknowledgments
16.1 Introduction16.2 Auditory Scene Analysis16.3 Computational Auditory Scene Analysis16.4 CASA Strategies16.5 Integrating CASA with ASR16.6 Concluding RemarksAcknowledgment
17.1 Introduction17.2 Observation Uncertainty17.3 Uncertainty Decoding17.4 Feature-Based Uncertainty Decoding17.5 Model-Based Joint Uncertainty Decoding17.6 Noisy CMLLR17.7 Uncertainty and Adaptive Training17.8 In Combination with Other Techniques17.9 Conclusions

Content preview from Techniques for Noise Robustness in Automatic Speech Recognition

Adaptation and Discriminative Training of Acoustic Models

Yannick Estève, Paul Deléglise

University of Le Mans, France

11.1 Introduction

The main weakness of automatic speech-recognition (ASR) systems resides in their lack of robustness to variability. All the knowledge bases used in an ASR system are affected by this problem: the dictionary – that is the list of the words recognizable by the system, along with their pronunciation variants – the language models as well as the acoustic models. Those knowledge bases – most particularly language and acoustic models, of probabilistic essence – are very dependent on the data used to estimate their various parameters. The problem posed by this dependence of probabilistic models on their training corpora is made more significant by the high cost of building such corpora. As a result of that cost, in practice, it is common for probabilistic models to be used in application contexts that differ considerably from the context of their training data.

Such mismatch between training data and application context causes the models to lose some of their precision and predictive power, in turn degrading the quality of speech recognition. This is a well-known problem, which has led to the development of many techniques aiming at lessening its impact. Model adaptation consists in reducing the mismatch between probabilistic models and the data against which they are used.

Noise is a cause of mismatch: it constitutes a variable phenomenon with potentially ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Start your free trial

Audio Source Separation and Speech Enhancement

Publisher Resources

ISBN: 9781118392669Purchase book

Techniques for Noise Robustness in Automatic Speech Recognition

by Rita Singh, Tuomas Virtanen, Bhiksha Raj

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

You might also like

Audio Source Separation and Speech Enhancement

Hidden Semi-Markov Models

Parametric Time-Frequency Domain Spatial Audio

Robust Automatic Speech Recognition

Publisher Resources

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,and much more.

You might also like

Audio Source Separation and Speech Enhancement

Hidden Semi-Markov Models

Parametric Time-Frequency Domain Spatial Audio

Robust Automatic Speech Recognition

Publisher Resources

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.