O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Audio Source Separation and Speech Enhancement

Book Description

Learn the technology behind hearing aids, Siri, and Echo 

Audio source separation and speech enhancement aim to extract one or more source signals of interest from an audio recording involving several sound sources. These technologies are among the most studied in audio signal processing today and bear a critical role in the success of hearing aids, hands-free phones, voice command and other noise-robust audio analysis systems, and music post-production software.

Research on this topic has followed three convergent paths, starting with sensor array processing, computational auditory scene analysis, and machine learning based approaches such as independent component analysis, respectively. This book is the first one to provide a comprehensive overview by presenting the common foundations and the differences between these techniques in a unified setting.

Key features:

  • Consolidated perspective on audio source separation and speech enhancement.
  • Both historical perspective and latest advances in the field, e.g. deep neural networks.
  • Diverse disciplines: array processing, machine learning, and statistical signal processing.
  • Covers the most important techniques for both single-channel and multichannel processing.

This book provides both introductory and advanced material suitable for people with basic knowledge of signal processing and machine learning. Thanks to its comprehensiveness, it will help students select a promising research track, researchers leverage the acquired cross-domain knowledge to design improved techniques, and engineers and developers choose the right technology for their target application scenario. It will also be useful for practitioners from other fields (e.g., acoustics, multimedia, phonetics, and musicology) willing to exploit audio source separation or speech enhancement as pre-processing tools for their own needs.

Table of Contents

  1. Cover
  2. List of Authors
  3. Preface
  4. Acknowledgment
  5. Notations
  6. Acronyms
  7. About the Companion Website
  8. Part I: Prerequisites
    1. Chapter 1: Introduction
      1. 1.1 Why are Source Separation and Speech Enhancement Needed?
      2. 1.2 What are the Goals of Source Separation and Speech Enhancement?
      3. 1.3 How can Source Separation and Speech Enhancement be Addressed?
      4. 1.4 Outline
      5. Bibliography
    2. Chapter 2: Time‐Frequency Processing: Spectral Properties
      1. 2.1 Time‐Frequency Analysis and Synthesis
      2. 2.2 Source Properties in the Time‐Frequency Domain
      3. 2.3 Filtering in the Time‐Frequency Domain
      4. 2.4 Summary
      5. Bibliography
    3. Chapter 3: Acoustics: Spatial Properties
      1. 3.1 Formalization of the Mixing Process
      2. 3.2 Microphone Recordings
      3. 3.3 Artificial Mixtures
      4. 3.4 Impulse Response Models
      5. 3.5 Summary
      6. Bibliography
    4. Chapter 4: Multichannel Source Activity Detection, Localization, and Tracking
      1. 4.1 Basic Notions in Multichannel Spatial Audio
      2. 4.2 Multi‐Microphone Source Activity Detection
      3. 4.3 Source Localization
      4. 4.4 Summary
      5. Bibliography
  9. Part II: Single‐Channel Separation and Enhancement
    1. Chapter 5: Spectral Masking and Filtering
      1. 5.1 Time‐Frequency Masking
      2. 5.2 Mask Estimation Given the Signal Statistics
      3. 5.3 Perceptual Improvements
      4. 5.4 Summary
      5. Bibliography
    2. Chapter 6: Single‐Channel Speech Presence Probability Estimation and Noise Tracking
      1. 6.1 Speech Presence Probability and its Estimation
      2. 6.2 Noise Power Spectrum Tracking
      3. 6.3 Evaluation Measures
      4. 6.4 Summary
      5. Bibliography
    3. Chapter 7: Single‐Channel Classification and Clustering Approaches
      1. 7.1 Source Separation by Computational Auditory Scene Analysis
      2. 7.2 Source Separation by Factorial HMMs
      3. 7.3 Separation Based Training
      4. 7.4 Summary
      5. Bibliography
    4. Chapter 8: Nonnegative Matrix Factorization
      1. 8.1 NMF and Source Separation
      2. 8.2 NMF Theory and Algorithms
      3. 8.3 NMF Dictionary Learning Methods
      4. 8.4 Advanced NMF Models
      5. 8.5 Summary
      6. Bibliography
    5. Chapter 9: Temporal Extensions of Nonnegative Matrix Factorization
      1. 9.1 Convolutive NMF
      2. 9.2 Overview of Dynamical Models
      3. 9.3 Smooth NMF
      4. 9.4 Nonnegative State‐Space Models
      5. 9.5 Discrete Dynamical Models
      6. 9.6 The Use of Dynamic Models in Source Separation
      7. 9.7 Which Model to Use?
      8. 9.8 Summary
      9. 9.9 Standard Distributions
      10. Bibliography
  10. Part III: Multichannel Separation and Enhancement
    1. Chapter 10: Spatial Filtering
      1. 10.1 Fundamentals of Array Processing
      2. 10.2 Array Topologies
      3. 10.3 Data‐Independent Beamforming
      4. 10.4 Data‐Dependent Spatial Filters: Design Criteria
      5. 10.5 Generalized Sidelobe Canceler Implementation
      6. 10.6 Postfilters
      7. 10.7 Summary
      8. Bibliography
    2. Chapter 11: Multichannel Parameter Estimation
      1. 11.1 Multichannel Speech Presence Probability Estimators
      2. 11.2 Covariance Matrix Estimators Exploiting SPP
      3. 11.3 Methods for Weakly Guided and Strongly Guided RTF Estimation
      4. 11.4 Summary
      5. Bibliography
    3. Chapter 12: Multichannel Clustering and Classification Approaches
      1. 12.1 Two‐Channel Clustering
      2. 12.2 Multichannel Clustering
      3. 12.3 Multichannel Classification
      4. 12.4 Spatial Filtering Based on Masks
      5. 12.5 Summary
      6. Bibliography
    4. Chapter 13: Independent Component and Vector Analysis
      1. 13.1 Convolutive Mixtures and their Time‐Frequency Representations
      2. 13.2 Frequency‐Domain Independent Component Analysis
      3. 13.3 Independent Vector Analysis
      4. 13.4 Example
      5. 13.5 Summary
      6. Bibliography
    5. Chapter 14: Gaussian Model Based Multichannel Separation
      1. 14.1 Gaussian Modeling
      2. 14.2 Library of Spectral and Spatial Models
      3. 14.3 Parameter Estimation Criteria and Algorithms
      4. 14.4 Detailed Presentation of Some Methods
      5. 14.5 Summary
      6. Acknowledgment
      7. Bibliography
    6. Chapter 15: Dereverberation
      1. 15.1 Introduction to Dereverberation
      2. 15.2 Reverberation Cancellation Approaches
      3. 15.3 Reverberation Suppression Approaches
      4. 15.4 Direct Estimation
      5. 15.5 Evaluation of Dereverberation
      6. 15.6 Summary
      7. Bibliography
  11. Part IV: Application Scenarios and Perspectives
    1. Chapter 16: Applying Source Separation to Music
      1. 16.1 Challenges and Opportunities
      2. 16.2 Nonnegative Matrix Factorization in the Case of Music
      3. 16.3 Taking Advantage of the Harmonic Structure of Music
      4. 16.4 Nonparametric Local Models: Taking Advantage of Redundancies in Music
      5. 16.5 Taking Advantage of Multiple Instances
      6. 16.6 Interactive Source Separation
      7. 16.7 Crowd‐Based Evaluation
      8. 16.8 Some Examples of Applications
      9. 16.9 Summary
      10. Bibliography
    2. Chapter 17: Application of Source Separation to Robust Speech Analysis and Recognition
      1. 17.1 Challenges and Opportunities
      2. 17.2 Applications
      3. 17.3 Robust Speech Analysis and Recognition
      4. 17.4 Integration of Front‐End and Back‐End
      5. 17.5 Use of Multimodal Information with Source Separation
      6. 17.6 Summary
      7. Bibliography
    3. Chapter 18: Binaural Speech Processing with Application to Hearing Devices
      1. 18.1 Introduction to Binaural Processing
      2. 18.2 Binaural Hearing
      3. 18.3 Binaural Noise Reduction Paradigms
      4. 18.4 The Binaural Noise Reduction Problem
      5. 18.5 Extensions for Diffuse Noise
      6. 18.6 Extensions for Interfering Sources
      7. 18.7 Summary
      8. Bibliography
    4. Chapter 19: Perspectives
      1. 19.1 Advancing Deep Learning
      2. 19.2 Exploiting Phase Relationships
      3. 19.3 Advancing Multichannel Processing
      4. 19.4 Addressing Multiple‐Device Scenarios
      5. 19.5 Towards Widespread Commercial Use
      6. Acknowledgment
      7. Bibliography
  12. Index
  13. End User License Agreement