book

Audio Source Separation and Speech Enhancement

by Emmanuel Vincent, Tuomas Virtanen, Sharon Gannot

October 2018

Intermediate to advanced

504 pages

18h 50m

English

Wiley

Read now

Unlock full access

1.1 Why are Source Separation and Speech Enhancement Needed?1.2 What are the Goals of Source Separation and Speech Enhancement?1.3 How can Source Separation and Speech Enhancement be Addressed?1.4 OutlineBibliography
2.1 Time‐Frequency Analysis and Synthesis2.2 Source Properties in the Time‐Frequency Domain2.3 Filtering in the Time‐Frequency Domain2.4 SummaryBibliography

3.1 Formalization of the Mixing Process3.2 Microphone Recordings3.3 Artificial Mixtures3.4 Impulse Response Models3.5 SummaryBibliography
4.1 Basic Notions in Multichannel Spatial Audio4.2 Multi‐Microphone Source Activity Detection4.3 Source Localization4.4 SummaryBibliography
5.1 Time‐Frequency Masking5.2 Mask Estimation Given the Signal Statistics5.3 Perceptual Improvements5.4 SummaryBibliography
6.1 Speech Presence Probability and its Estimation6.2 Noise Power Spectrum Tracking6.3 Evaluation Measures6.4 SummaryBibliography
7.1 Source Separation by Computational Auditory Scene Analysis7.2 Source Separation by Factorial HMMs7.3 Separation Based Training7.4 SummaryBibliography
8.1 NMF and Source Separation8.2 NMF Theory and Algorithms8.3 NMF Dictionary Learning Methods8.4 Advanced NMF Models8.5 SummaryBibliography
9.1 Convolutive NMF9.2 Overview of Dynamical Models9.3 Smooth NMF9.4 Nonnegative State‐Space Models9.5 Discrete Dynamical Models9.6 The Use of Dynamic Models in Source Separation9.7 Which Model to Use?9.8 Summary9.9 Standard DistributionsBibliography
10.1 Fundamentals of Array Processing10.2 Array Topologies10.3 Data‐Independent Beamforming10.4 Data‐Dependent Spatial Filters: Design Criteria10.5 Generalized Sidelobe Canceler Implementation10.6 Postfilters10.7 SummaryBibliography
11.1 Multichannel Speech Presence Probability Estimators11.2 Covariance Matrix Estimators Exploiting SPP11.3 Methods for Weakly Guided and Strongly Guided RTF Estimation11.4 SummaryBibliography
12.1 Two‐Channel Clustering12.2 Multichannel Clustering12.3 Multichannel Classification12.4 Spatial Filtering Based on Masks12.5 SummaryBibliography
13.1 Convolutive Mixtures and their Time‐Frequency Representations13.2 Frequency‐Domain Independent Component Analysis13.3 Independent Vector Analysis13.4 Example13.5 SummaryBibliography
14.1 Gaussian Modeling14.2 Library of Spectral and Spatial Models14.3 Parameter Estimation Criteria and Algorithms14.4 Detailed Presentation of Some Methods14.5 SummaryAcknowledgmentBibliography
15.1 Introduction to Dereverberation15.2 Reverberation Cancellation Approaches15.3 Reverberation Suppression Approaches15.4 Direct Estimation15.5 Evaluation of Dereverberation15.6 SummaryBibliography
16.1 Challenges and Opportunities16.2 Nonnegative Matrix Factorization in the Case of Music16.3 Taking Advantage of the Harmonic Structure of Music16.4 Nonparametric Local Models: Taking Advantage of Redundancies in Music16.5 Taking Advantage of Multiple Instances16.6 Interactive Source Separation16.7 Crowd‐Based Evaluation16.8 Some Examples of Applications16.9 SummaryBibliography
17.1 Challenges and Opportunities17.2 Applications17.3 Robust Speech Analysis and Recognition17.4 Integration of Front‐End and Back‐End17.5 Use of Multimodal Information with Source Separation17.6 SummaryBibliography
18.1 Introduction to Binaural Processing18.2 Binaural Hearing18.3 Binaural Noise Reduction Paradigms18.4 The Binaural Noise Reduction Problem18.5 Extensions for Diffuse Noise18.6 Extensions for Interfering Sources18.7 SummaryBibliography
19.1 Advancing Deep Learning19.2 Exploiting Phase Relationships19.3 Advancing Multichannel Processing19.4 Addressing Multiple‐Device Scenarios19.5 Towards Widespread Commercial UseAcknowledgmentBibliography

Content preview from Audio Source Separation and Speech Enhancement

8Nonnegative Matrix Factorization

Roland Badeau and Tuomas Virtanen

Nonnegative matrix factorization (NMF) refers to a set of techniques that have been used to model the spectra of sound sources in various audio applications, including source separation. Sound sources have a structure in time and frequency: music consists of basic units like notes and chords played by different instruments, speech consists of elementary units such as phonemes, syllables or words, and environmental sounds consist of sound events produced by various sound sources. NMF models this structure by representing the spectra of sounds as a sum of components with fixed spectrum and time‐varying gain, so that each component in the model represents these elementary units in the sound.

Modeling this structure is beneficial in source separation, since inferring the structure makes it possible to use contextual information for source separation. NMF is typically used to model the magnitude or power spectrogram of audio signals, and its ability to represent the structure of audio sources makes separation possible even in single‐channel scenarios.

This chapter presents the use of NMF‐based single‐channel techniques. In Section 8.1 we introduce the basic NMF model used in various single‐channel source separation scenarios. In Section 8.2, several deterministic and probabilistic frameworks for NMF are presented, along with various NMF algorithms. Then several methods that can be used to learn NMF components by using suitable ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Start your free trial

Techniques for Noise Robustness in Automatic Speech Recognition

Publisher Resources

ISBN: 9781119279891Purchase book

Audio Source Separation and Speech Enhancement

by Emmanuel Vincent, Tuomas Virtanen, Sharon Gannot

8Nonnegative Matrix Factorization

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

You might also like

Techniques for Noise Robustness in Automatic Speech Recognition

Parametric Time-Frequency Domain Spatial Audio

Academic Press Library in Signal Processing

Digital Alias-free Signal Processing

Publisher Resources

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,and much more.

You might also like

Techniques for Noise Robustness in Automatic Speech Recognition

Parametric Time-Frequency Domain Spatial Audio

Academic Press Library in Signal Processing

Digital Alias-free Signal Processing

Publisher Resources

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.