book

Audio Source Separation and Speech Enhancement

by Emmanuel Vincent, Tuomas Virtanen, Sharon Gannot

October 2018

Intermediate to advanced

504 pages

18h 50m

English

Wiley

Read now

Unlock full access

1.1 Why are Source Separation and Speech Enhancement Needed?1.2 What are the Goals of Source Separation and Speech Enhancement?1.3 How can Source Separation and Speech Enhancement be Addressed?1.4 OutlineBibliography
2.1 Time‐Frequency Analysis and Synthesis2.2 Source Properties in the Time‐Frequency Domain2.3 Filtering in the Time‐Frequency Domain2.4 SummaryBibliography

3.1 Formalization of the Mixing Process3.2 Microphone Recordings3.3 Artificial Mixtures3.4 Impulse Response Models3.5 SummaryBibliography
4.1 Basic Notions in Multichannel Spatial Audio4.2 Multi‐Microphone Source Activity Detection4.3 Source Localization4.4 SummaryBibliography
5.1 Time‐Frequency Masking5.2 Mask Estimation Given the Signal Statistics5.3 Perceptual Improvements5.4 SummaryBibliography
6.1 Speech Presence Probability and its Estimation6.2 Noise Power Spectrum Tracking6.3 Evaluation Measures6.4 SummaryBibliography
7.1 Source Separation by Computational Auditory Scene Analysis7.2 Source Separation by Factorial HMMs7.3 Separation Based Training7.4 SummaryBibliography
8.1 NMF and Source Separation8.2 NMF Theory and Algorithms8.3 NMF Dictionary Learning Methods8.4 Advanced NMF Models8.5 SummaryBibliography
9.1 Convolutive NMF9.2 Overview of Dynamical Models9.3 Smooth NMF9.4 Nonnegative State‐Space Models9.5 Discrete Dynamical Models9.6 The Use of Dynamic Models in Source Separation9.7 Which Model to Use?9.8 Summary9.9 Standard DistributionsBibliography
10.1 Fundamentals of Array Processing10.2 Array Topologies10.3 Data‐Independent Beamforming10.4 Data‐Dependent Spatial Filters: Design Criteria10.5 Generalized Sidelobe Canceler Implementation10.6 Postfilters10.7 SummaryBibliography
11.1 Multichannel Speech Presence Probability Estimators11.2 Covariance Matrix Estimators Exploiting SPP11.3 Methods for Weakly Guided and Strongly Guided RTF Estimation11.4 SummaryBibliography
12.1 Two‐Channel Clustering12.2 Multichannel Clustering12.3 Multichannel Classification12.4 Spatial Filtering Based on Masks12.5 SummaryBibliography
13.1 Convolutive Mixtures and their Time‐Frequency Representations13.2 Frequency‐Domain Independent Component Analysis13.3 Independent Vector Analysis13.4 Example13.5 SummaryBibliography
14.1 Gaussian Modeling14.2 Library of Spectral and Spatial Models14.3 Parameter Estimation Criteria and Algorithms14.4 Detailed Presentation of Some Methods14.5 SummaryAcknowledgmentBibliography
15.1 Introduction to Dereverberation15.2 Reverberation Cancellation Approaches15.3 Reverberation Suppression Approaches15.4 Direct Estimation15.5 Evaluation of Dereverberation15.6 SummaryBibliography
16.1 Challenges and Opportunities16.2 Nonnegative Matrix Factorization in the Case of Music16.3 Taking Advantage of the Harmonic Structure of Music16.4 Nonparametric Local Models: Taking Advantage of Redundancies in Music16.5 Taking Advantage of Multiple Instances16.6 Interactive Source Separation16.7 Crowd‐Based Evaluation16.8 Some Examples of Applications16.9 SummaryBibliography
17.1 Challenges and Opportunities17.2 Applications17.3 Robust Speech Analysis and Recognition17.4 Integration of Front‐End and Back‐End17.5 Use of Multimodal Information with Source Separation17.6 SummaryBibliography
18.1 Introduction to Binaural Processing18.2 Binaural Hearing18.3 Binaural Noise Reduction Paradigms18.4 The Binaural Noise Reduction Problem18.5 Extensions for Diffuse Noise18.6 Extensions for Interfering Sources18.7 SummaryBibliography
19.1 Advancing Deep Learning19.2 Exploiting Phase Relationships19.3 Advancing Multichannel Processing19.4 Addressing Multiple‐Device Scenarios19.5 Towards Widespread Commercial UseAcknowledgmentBibliography

Content preview from Audio Source Separation and Speech Enhancement

17Application of Source Separation to Robust Speech Analysis and Recognition

Shinji Watanabe Tuomas Virtanen and Dorothea Kolossa

This chapter describes applications of source separation techniques to robust speech analysis and recognition, including automatic speech recognition (ASR), speaker/language identification, emotion and paralinguistic analysis, and audiovisual analysis. These are the most successful applications in audio and speech processing, with various commercial products including Google Voice Search, Apple Siri, Amazon Echo, and Microsoft Cortana. Robustness against noise or nontarget speech still remains a challenging issue, and source separation and speech enhancement techniques are gathering much attention in the speech community.

This chapter systematically describes how source separation and speech enhancement techniques are applied to improve the robustness of these applications. It first describes the challenges and opportunities in Section 17.1, and defines the considered speech analysis and recognition applications with basic formulations in Section 17.2. Section 17.3 describes the current state‐of‐the‐art system using source separation as a front‐end method for speech analysis and recognition. Section 17.4 introduces a way of tightly integrating these methods by preserving the uncertainties between them. Section 17.5 provides another possible solution to the robustness issues with the help of cross‐modality information. Section 17.6 concludes the chapter. ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Start your free trial

Techniques for Noise Robustness in Automatic Speech Recognition

Publisher Resources

ISBN: 9781119279891Purchase book

Audio Source Separation and Speech Enhancement

by Emmanuel Vincent, Tuomas Virtanen, Sharon Gannot

17Application of Source Separation to Robust Speech Analysis and Recognition

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

You might also like

Techniques for Noise Robustness in Automatic Speech Recognition

Parametric Time-Frequency Domain Spatial Audio

Academic Press Library in Signal Processing

Digital Alias-free Signal Processing

Publisher Resources

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,and much more.

You might also like

Techniques for Noise Robustness in Automatic Speech Recognition

Parametric Time-Frequency Domain Spatial Audio

Academic Press Library in Signal Processing

Digital Alias-free Signal Processing

Publisher Resources

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.