book

Audio Source Separation and Speech Enhancement

by Emmanuel Vincent, Tuomas Virtanen, Sharon Gannot

October 2018

Intermediate to advanced

504 pages

18h 50m

English

Wiley

Read now

Unlock full access

1.1 Why are Source Separation and Speech Enhancement Needed?1.2 What are the Goals of Source Separation and Speech Enhancement?1.3 How can Source Separation and Speech Enhancement be Addressed?1.4 OutlineBibliography
2.1 Time‐Frequency Analysis and Synthesis2.2 Source Properties in the Time‐Frequency Domain2.3 Filtering in the Time‐Frequency Domain2.4 SummaryBibliography

3.1 Formalization of the Mixing Process3.2 Microphone Recordings3.3 Artificial Mixtures3.4 Impulse Response Models3.5 SummaryBibliography
4.1 Basic Notions in Multichannel Spatial Audio4.2 Multi‐Microphone Source Activity Detection4.3 Source Localization4.4 SummaryBibliography
5.1 Time‐Frequency Masking5.2 Mask Estimation Given the Signal Statistics5.3 Perceptual Improvements5.4 SummaryBibliography
6.1 Speech Presence Probability and its Estimation6.2 Noise Power Spectrum Tracking6.3 Evaluation Measures6.4 SummaryBibliography
7.1 Source Separation by Computational Auditory Scene Analysis7.2 Source Separation by Factorial HMMs7.3 Separation Based Training7.4 SummaryBibliography
8.1 NMF and Source Separation8.2 NMF Theory and Algorithms8.3 NMF Dictionary Learning Methods8.4 Advanced NMF Models8.5 SummaryBibliography
9.1 Convolutive NMF9.2 Overview of Dynamical Models9.3 Smooth NMF9.4 Nonnegative State‐Space Models9.5 Discrete Dynamical Models9.6 The Use of Dynamic Models in Source Separation9.7 Which Model to Use?9.8 Summary9.9 Standard DistributionsBibliography
10.1 Fundamentals of Array Processing10.2 Array Topologies10.3 Data‐Independent Beamforming10.4 Data‐Dependent Spatial Filters: Design Criteria10.5 Generalized Sidelobe Canceler Implementation10.6 Postfilters10.7 SummaryBibliography
11.1 Multichannel Speech Presence Probability Estimators11.2 Covariance Matrix Estimators Exploiting SPP11.3 Methods for Weakly Guided and Strongly Guided RTF Estimation11.4 SummaryBibliography
12.1 Two‐Channel Clustering12.2 Multichannel Clustering12.3 Multichannel Classification12.4 Spatial Filtering Based on Masks12.5 SummaryBibliography
13.1 Convolutive Mixtures and their Time‐Frequency Representations13.2 Frequency‐Domain Independent Component Analysis13.3 Independent Vector Analysis13.4 Example13.5 SummaryBibliography
14.1 Gaussian Modeling14.2 Library of Spectral and Spatial Models14.3 Parameter Estimation Criteria and Algorithms14.4 Detailed Presentation of Some Methods14.5 SummaryAcknowledgmentBibliography
15.1 Introduction to Dereverberation15.2 Reverberation Cancellation Approaches15.3 Reverberation Suppression Approaches15.4 Direct Estimation15.5 Evaluation of Dereverberation15.6 SummaryBibliography
16.1 Challenges and Opportunities16.2 Nonnegative Matrix Factorization in the Case of Music16.3 Taking Advantage of the Harmonic Structure of Music16.4 Nonparametric Local Models: Taking Advantage of Redundancies in Music16.5 Taking Advantage of Multiple Instances16.6 Interactive Source Separation16.7 Crowd‐Based Evaluation16.8 Some Examples of Applications16.9 SummaryBibliography
17.1 Challenges and Opportunities17.2 Applications17.3 Robust Speech Analysis and Recognition17.4 Integration of Front‐End and Back‐End17.5 Use of Multimodal Information with Source Separation17.6 SummaryBibliography
18.1 Introduction to Binaural Processing18.2 Binaural Hearing18.3 Binaural Noise Reduction Paradigms18.4 The Binaural Noise Reduction Problem18.5 Extensions for Diffuse Noise18.6 Extensions for Interfering Sources18.7 SummaryBibliography
19.1 Advancing Deep Learning19.2 Exploiting Phase Relationships19.3 Advancing Multichannel Processing19.4 Addressing Multiple‐Device Scenarios19.5 Towards Widespread Commercial UseAcknowledgmentBibliography

Content preview from Audio Source Separation and Speech Enhancement

12Multichannel Clustering and Classification Approaches

Michael I. Mandel Shoko Araki and Tomohiro Nakatani

This chapter describes methods for estimating time‐frequency masks of source activity from multichannel observations using clustering and classification techniques. Such methods are similar to the speech presence probability (SPP) estimates in Chapter 11, but can be applied to any signal, not just speech, and can be applied in the presence of nonstationary noise, not just stationary noise. Clustering algorithms estimate time‐frequency masks by grouping together time‐frequency bins with similar characteristics. Classification algorithms estimate these masks based on a comparison of time‐frequency bins in the signal under analysis to those of previously seen training data. Because clustering algorithms only compare parts of the test signal to one another, they typically do not require training data. Classification algorithms, in contrast, are extremely dependent on the characteristics and quality of their training data. In the notation of Section 1.3.3, clustering is generally a learning‐free method, while classification is a separation‐based training method.

This chapter is also related to Chapter 14, which describes a complete generative model of the joint spatial and time‐frequency characteristics of multichannel signals that can be used to separate or enhance target signals of interest. The methods described in the current chapter, in contrast, focus on estimating only the ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Start your free trial

Techniques for Noise Robustness in Automatic Speech Recognition

Publisher Resources

ISBN: 9781119279891Purchase book

Audio Source Separation and Speech Enhancement

by Emmanuel Vincent, Tuomas Virtanen, Sharon Gannot

12Multichannel Clustering and Classification Approaches

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

You might also like

Techniques for Noise Robustness in Automatic Speech Recognition

Parametric Time-Frequency Domain Spatial Audio

Academic Press Library in Signal Processing

Digital Alias-free Signal Processing

Publisher Resources

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,and much more.

You might also like

Techniques for Noise Robustness in Automatic Speech Recognition

Parametric Time-Frequency Domain Spatial Audio

Academic Press Library in Signal Processing

Digital Alias-free Signal Processing

Publisher Resources

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.