Robust Automatic Speech Recognition

Book description

Robust Automatic Speech Recognition: A Bridge to Practical Applications establishes a solid foundation for automatic speech recognition that is robust against acoustic environmental distortion. It provides a thorough overview of classical and modern noise-and reverberation robust techniques that have been developed over the past thirty years, with an emphasis on practical methods that have been proven to be successful and which are likely to be further developed for future applications.The strengths and weaknesses of robustness-enhancing speech recognition techniques are carefully analyzed. The book covers noise-robust techniques designed for acoustic models which are based on both Gaussian mixture models and deep neural networks. In addition, a guide to selecting the best methods for practical applications is provided.The reader will:

  • Gain a unified, deep and systematic understanding of the state-of-the-art technologies for robust speech recognition
  • Learn the links and relationship between alternative technologies for robust speech recognition
  • Be able to use the technology analysis and categorization detailed in the book to guide future technology development
  • Be able to develop new noise-robust methods in the current era of deep learning for acoustic modeling in speech recognition
  • The first book that provides a comprehensive review on noise and reverberation robust speech recognition methods in the era of deep neural networks
  • Connects robust speech recognition techniques to machine learning paradigms with rigorous mathematical treatment
  • Provides elegant and structural ways to categorize and analyze noise-robust speech recognition techniques
  • Written by leading researchers who have been actively working on the subject matter in both industrial and academic organizations for many years

Table of contents

  1. Cover image
  2. Title page
  3. Table of Contents
  4. Copyright
  5. About the Authors
  6. List of Figures
  7. List of Tables
  8. Acronyms
  9. Notations
  10. Chapter 1: Introduction
    1. Abstract
    2. 1.1 Automatic Speech Recognition
    3. 1.2 Robustness to Noisy Environments
    4. 1.3 Existing Surveys in the Area
    5. 1.4 Book Structure Overview
  11. Chapter 2: Fundamentals of speech recognition
    1. Abstract
    2. 2.1 Introduction: Components of Speech Recognition
    3. 2.2 Gaussian Mixture Models
    4. 2.3 Hidden Markov Models and the Variants
    5. 2.4 Deep Learning and Deep Neural Networks
    6. 2.5 Summary
  12. Chapter 3: Background of robust speech recognition
    1. Abstract
    2. 3.1 Standard Evaluation Databases
    3. 3.2 Modeling Distortions of Speech in Acoustic Environments
    4. 3.3 Impact of Acoustic Distortion on Gaussian Modeling
    5. 3.4 Impact of Acoustic Distortion on DNN Modeling
    6. 3.5 A General Framework for Robust Speech Recognition
    7. 3.6 Categorizing Robust ASR Techniques: An Overview
    8. 3.7 Summary
  13. Chapter 4: Processing in the feature and model domains
    1. Abstract
    2. 4.1 Feature-Space Approaches
    3. 4.2 Model-Space Approaches
    4. 4.3 Summary
  14. Chapter 5: Compensation with prior knowledge
    1. Abstract
    2. 5.1 Learning from Stereo Data
    3. 5.2 Learning from Multi-Environment Data
    4. 5.3 Summary
  15. Chapter 6: Explicit distortion modeling
    1. Abstract
    2. 6.1 Parallel Model Combination
    3. 6.2 Vector Taylor Series
    4. 6.3 Sampling-Based Methods
    5. 6.4 Acoustic Factorization
    6. 6.5 Summary
  16. Chapter 7: Uncertainty processing
    1. Abstract
    2. 7.1 Model-Domain Uncertainty
    3. 7.2 Feature-Domain Uncertainty
    4. 7.3 Joint Uncertainty Decoding
    5. 7.4 Missing-Feature Approaches
    6. 7.5 Summary
  17. Chapter 8: Joint model training
    1. Abstract
    2. 8.1 Speaker Adaptive and Source Normalization Training
    3. 8.2 Model Space Noise Adaptive Training
    4. 8.3 Joint Training for DNN
    5. 8.4 Summary
  18. Chapter 9: Reverberant speech recognition
    1. Abstract
    2. 9.1 Introduction
    3. 9.2 Acoustic Impulse Response
    4. 9.3 A Model of Reverberated Speech in Different Domains
    5. 9.4 The Effect of Reverberation on ASR Performance
    6. 9.5 Linear Filtering Approaches
    7. 9.6 Magnitude or Power Spectrum Enhancement
    8. 9.7 Feature Domain Approaches
    9. 9.8 Acoustic Model Domain Approaches
    10. 9.9 The REVERB Challenge
    11. 9.10 To Probe Further
    12. 9.11 Summary
  19. Chapter 10: Multi-channel processing
    1. Abstract
    2. 10.1 Introduction
    3. 10.2 The Acoustic Beamforming Problem
    4. 10.3 Fundamentals of Data-Dependent Beamforming
    5. 10.4 Multi-Channel Speech Recognition
    6. 10.5 To Probe Further
    7. 10.6 Summary
  20. Chapter 11: Summary and future directions
    1. Abstract
    2. 11.1 Robust Methods in the Era of GMM
    3. 11.2 Robust Methods in the Era of DNN
    4. 11.3 Multi-Channel Input and Robustness to Reverberation
    5. 11.4 Epilogue
  21. Index

Product information

  • Title: Robust Automatic Speech Recognition
  • Author(s): Jinyu Li, Li Deng, Reinhold Haeb-Umbach, Yifan Gong
  • Release date: October 2015
  • Publisher(s): Academic Press
  • ISBN: 9780128026168