book

Techniques for Noise Robustness in Automatic Speech Recognition

by Rita Singh, Tuomas Virtanen, Bhiksha Raj

November 2012

Intermediate to advanced

514 pages

17h 40m

English

Wiley

Read now

Unlock full access

1.1 Scope of the Book1.2 Outline1.3 Notation
2.1 Introduction2.2 Speech Recognition Viewed as Bayes Classification2.3 Hidden Markov Models2.4 HMM-Based Speech Recognition
3.1 Errors in Bayes Classification3.2 Bayes Classification and ASR3.3 External Influences on Speech Recordings3.4 The Effect of External Influences on Recognition3.5 Improving Recognition under Adverse Conditions

4.1 Introduction4.2 Signal Analysis and Synthesis4.3 Voice Activity Detection4.4 Noise Power Spectrum Estimation4.5 Adaptive Filters for Signal Enhancement4.6 ASR Performance4.7 Conclusions
5.1 The Problem with Mixtures5.2 Multichannel Mixtures5.3 Single-Channel Mixtures5.4 Variations and Extensions5.5 Conclusions
6.1 Speaker Tracking6.2 Conventional Microphone Arrays6.3 Conventional Adaptive Beamforming Algorithms6.4 Spherical Microphone Arrays6.5 Spherical Adaptive Algorithms6.6 Comparative Studies6.7 Comparison of Linear and Spherical Arrays for DSR6.8 Conclusions and Further Reading
7.1 Introduction7.2 The Speech Signal7.3 Spectral Processing7.4 Cepstral Processing7.5 Influence of Distortions on Different Speech Features7.6 Summary and Further Reading
8.1 Introduction8.2 Some Attributes of Auditory Physiology and Perception8.3 “Classic” Auditory Representations8.4 Current Trends in Auditory Feature Analysis8.5 SummaryAcknowledgments
9.1 Life in an Ideal World9.2 MMSE-SPLICE9.3 Discriminative SPLICE9.4 Model-Based Feature Enhancement9.5 Switching Linear Dynamic System9.6 Conclusion
10.1 Introduction10.2 The Effect of Reverberation10.3 Approaches to Reverberant Speech Recognition10.4 Feature Domain Model of the Acoustic Impulse Response10.5 Bayesian Feature Enhancement10.6 Experimental Results10.7 ConclusionsAcknowledgment
11.1 Introduction11.2 Acoustic Model Adaptation and Noise Robustness11.3 Maximum A Posteriori Reestimation11.4 Maximum Likelihood Linear Regression11.5 Discriminative Training11.6 Conclusion
12.1 Introduction12.2 The Model-Based Approach12.3 Signal Feature Domains12.4 Interaction Models12.5 Inference Methods12.6 Efficient Likelihood Evaluation in Factorial Models12.7 Current Directions
13.1 Introduction13.2 Traditional Training Methods for Robust Speech Recognition13.3 A Brief Overview of Speaker Adaptive Training13.4 Feature-Space Noise Adaptive Training13.5 Model-Space Noise Adaptive Training13.6 Noise Adaptive Training using VTS Adaptation13.7 Discussion13.8 Conclusion
14.1 Introduction14.2 Classification with Incomplete Data14.3 Energetic Masking14.4 Meta-Missing Data: Dealing with Mask Uncertainty14.5 Some Perspectives on Performance
15.1 Introduction15.2 Missing-Data Techniques15.3 Correlation-Based Imputation15.4 Cluster-Based Imputation15.5 Class-Conditioned Imputation15.6 Sparse Imputation15.7 Other Feature-Reconstruction Methods15.8 Experimental Results15.9 Discussion and ConclusionAcknowledgments
16.1 Introduction16.2 Auditory Scene Analysis16.3 Computational Auditory Scene Analysis16.4 CASA Strategies16.5 Integrating CASA with ASR16.6 Concluding RemarksAcknowledgment
17.1 Introduction17.2 Observation Uncertainty17.3 Uncertainty Decoding17.4 Feature-Based Uncertainty Decoding17.5 Model-Based Joint Uncertainty Decoding17.6 Noisy CMLLR17.7 Uncertainty and Adaptive Training17.8 In Combination with Other Techniques17.9 Conclusions

Content preview from Techniques for Noise Robustness in Automatic Speech Recognition

Introduction

Tuomas Virtanen1, Rita Singh2, Bhiksha Raj2

1Tampere University of Technology, Finland 2Carnegie Mellon University, USA

1.1 Scope of the Book

The term “computer speech recognition” conjures up visions of the science-fiction capabilities of HAL2000 in 2001, A Space Odessey, or “Data,” the anthropoid robot in Star Trek, who can communicate through speech with as much ease as a human being. However, our real-life encounters with automatic speech recognition are usually rather less impressive, comprising often-annoying exchanges with interactive voice response, dictation, and transcription systems that make many mistakes, frequently misrecognizing what is spoken in a way that humans rarely would. The reasons for these mistakes are many. Some of the reasons have to do with fundamental limitations of the mathematical framework employed, and inadequate awareness or representation of context, world knowledge, and language. But other equally important sources of error are distortions introduced into the recorded audio during recording, transmission, and storage.

As automatic speech-recognition—or ASR—systems find increasing use in everyday life, the speech they must recognize is being recorded over a wider variety of conditions than ever before. It may be recorded over a variety of channels, including landline and cellular phones, the internet, etc. using different kinds of microphones, which may be placed close to the mouth such as in head-mounted microphones or telephone ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Start your free trial

Audio Source Separation and Speech Enhancement

Publisher Resources

ISBN: 9781118392669Purchase book

Techniques for Noise Robustness in Automatic Speech Recognition

by Rita Singh, Tuomas Virtanen, Bhiksha Raj

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

You might also like

Audio Source Separation and Speech Enhancement

Hidden Semi-Markov Models

Parametric Time-Frequency Domain Spatial Audio

Robust Automatic Speech Recognition

Publisher Resources

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,and much more.

You might also like

Audio Source Separation and Speech Enhancement

Hidden Semi-Markov Models

Parametric Time-Frequency Domain Spatial Audio

Robust Automatic Speech Recognition

Publisher Resources

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.