book

Audio Source Separation and Speech Enhancement

by Emmanuel Vincent, Tuomas Virtanen, Sharon Gannot

October 2018

Intermediate to advanced

504 pages

18h 50m

English

Wiley

Read now

Unlock full access

1.1 Why are Source Separation and Speech Enhancement Needed?1.2 What are the Goals of Source Separation and Speech Enhancement?1.3 How can Source Separation and Speech Enhancement be Addressed?1.4 OutlineBibliography
2.1 Time‐Frequency Analysis and Synthesis2.2 Source Properties in the Time‐Frequency Domain2.3 Filtering in the Time‐Frequency Domain2.4 SummaryBibliography

3.1 Formalization of the Mixing Process3.2 Microphone Recordings3.3 Artificial Mixtures3.4 Impulse Response Models3.5 SummaryBibliography
4.1 Basic Notions in Multichannel Spatial Audio4.2 Multi‐Microphone Source Activity Detection4.3 Source Localization4.4 SummaryBibliography
5.1 Time‐Frequency Masking5.2 Mask Estimation Given the Signal Statistics5.3 Perceptual Improvements5.4 SummaryBibliography
6.1 Speech Presence Probability and its Estimation6.2 Noise Power Spectrum Tracking6.3 Evaluation Measures6.4 SummaryBibliography
7.1 Source Separation by Computational Auditory Scene Analysis7.2 Source Separation by Factorial HMMs7.3 Separation Based Training7.4 SummaryBibliography
8.1 NMF and Source Separation8.2 NMF Theory and Algorithms8.3 NMF Dictionary Learning Methods8.4 Advanced NMF Models8.5 SummaryBibliography
9.1 Convolutive NMF9.2 Overview of Dynamical Models9.3 Smooth NMF9.4 Nonnegative State‐Space Models9.5 Discrete Dynamical Models9.6 The Use of Dynamic Models in Source Separation9.7 Which Model to Use?9.8 Summary9.9 Standard DistributionsBibliography
10.1 Fundamentals of Array Processing10.2 Array Topologies10.3 Data‐Independent Beamforming10.4 Data‐Dependent Spatial Filters: Design Criteria10.5 Generalized Sidelobe Canceler Implementation10.6 Postfilters10.7 SummaryBibliography
11.1 Multichannel Speech Presence Probability Estimators11.2 Covariance Matrix Estimators Exploiting SPP11.3 Methods for Weakly Guided and Strongly Guided RTF Estimation11.4 SummaryBibliography
12.1 Two‐Channel Clustering12.2 Multichannel Clustering12.3 Multichannel Classification12.4 Spatial Filtering Based on Masks12.5 SummaryBibliography
13.1 Convolutive Mixtures and their Time‐Frequency Representations13.2 Frequency‐Domain Independent Component Analysis13.3 Independent Vector Analysis13.4 Example13.5 SummaryBibliography
14.1 Gaussian Modeling14.2 Library of Spectral and Spatial Models14.3 Parameter Estimation Criteria and Algorithms14.4 Detailed Presentation of Some Methods14.5 SummaryAcknowledgmentBibliography
15.1 Introduction to Dereverberation15.2 Reverberation Cancellation Approaches15.3 Reverberation Suppression Approaches15.4 Direct Estimation15.5 Evaluation of Dereverberation15.6 SummaryBibliography
16.1 Challenges and Opportunities16.2 Nonnegative Matrix Factorization in the Case of Music16.3 Taking Advantage of the Harmonic Structure of Music16.4 Nonparametric Local Models: Taking Advantage of Redundancies in Music16.5 Taking Advantage of Multiple Instances16.6 Interactive Source Separation16.7 Crowd‐Based Evaluation16.8 Some Examples of Applications16.9 SummaryBibliography
17.1 Challenges and Opportunities17.2 Applications17.3 Robust Speech Analysis and Recognition17.4 Integration of Front‐End and Back‐End17.5 Use of Multimodal Information with Source Separation17.6 SummaryBibliography
18.1 Introduction to Binaural Processing18.2 Binaural Hearing18.3 Binaural Noise Reduction Paradigms18.4 The Binaural Noise Reduction Problem18.5 Extensions for Diffuse Noise18.6 Extensions for Interfering Sources18.7 SummaryBibliography
19.1 Advancing Deep Learning19.2 Exploiting Phase Relationships19.3 Advancing Multichannel Processing19.4 Addressing Multiple‐Device Scenarios19.5 Towards Widespread Commercial UseAcknowledgmentBibliography

Content preview from Audio Source Separation and Speech Enhancement

Preface

Source separation and speech enhancement are some of the most studied technologies in audio signal processing. Their goal is to extract one or more source signals of interest from an audio recording involving several sound sources. This problem arises in many everyday situations. For instance, spoken communication is often obscured by concurrent speakers or by background noise, outdoor recordings feature a variety of environmental sounds, and most music recordings involve a group of instruments. When facing such scenes, humans are able to perceive and listen to individual sources so as to communicate with other speakers, navigate in a crowded street or memorize the melody of a song. Source separation and speech enhancement technologies aim to empower machines with similar abilities.

These technologies are already present in our lives today. Beyond “clean” single‐source signals recorded with close microphones, they allow the industry to extend the applicability of speech and audio processing systems to multi‐source, reverberant, noisy signals recorded with distant microphones. Some of the most striking examples include hearing aids, speech enhancement for smartphones, and distant‐microphone voice command systems. Current technologies are expected to keep improving and spread to many other scenarios in the next few years.

Traditionally, speech enhancement has referred to the problem of segregating speech and background noise, while source separation has referred to the segregation ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Start your free trial

Techniques for Noise Robustness in Automatic Speech Recognition

Publisher Resources

ISBN: 9781119279891Purchase book

Audio Source Separation and Speech Enhancement

by Emmanuel Vincent, Tuomas Virtanen, Sharon Gannot

Preface

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

You might also like

Techniques for Noise Robustness in Automatic Speech Recognition

Parametric Time-Frequency Domain Spatial Audio

Academic Press Library in Signal Processing

Digital Alias-free Signal Processing

Publisher Resources

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,and much more.

You might also like

Techniques for Noise Robustness in Automatic Speech Recognition

Parametric Time-Frequency Domain Spatial Audio

Academic Press Library in Signal Processing

Digital Alias-free Signal Processing

Publisher Resources

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.