Skip to Content
Audio Source Separation and Speech Enhancement
book

Audio Source Separation and Speech Enhancement

by Emmanuel Vincent, Tuomas Virtanen, Sharon Gannot
October 2018
Intermediate to advanced
504 pages
18h 50m
English
Wiley
Content preview from Audio Source Separation and Speech Enhancement

8Nonnegative Matrix Factorization

Roland Badeau and Tuomas Virtanen

Nonnegative matrix factorization (NMF) refers to a set of techniques that have been used to model the spectra of sound sources in various audio applications, including source separation. Sound sources have a structure in time and frequency: music consists of basic units like notes and chords played by different instruments, speech consists of elementary units such as phonemes, syllables or words, and environmental sounds consist of sound events produced by various sound sources. NMF models this structure by representing the spectra of sounds as a sum of components with fixed spectrum and time‐varying gain, so that each component in the model represents these elementary units in the sound.

Modeling this structure is beneficial in source separation, since inferring the structure makes it possible to use contextual information for source separation. NMF is typically used to model the magnitude or power spectrogram of audio signals, and its ability to represent the structure of audio sources makes separation possible even in single‐channel scenarios.

This chapter presents the use of NMF‐based single‐channel techniques. In Section 8.1 we introduce the basic NMF model used in various single‐channel source separation scenarios. In Section 8.2, several deterministic and probabilistic frameworks for NMF are presented, along with various NMF algorithms. Then several methods that can be used to learn NMF components by using suitable ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Techniques for Noise Robustness in Automatic Speech Recognition

Techniques for Noise Robustness in Automatic Speech Recognition

Rita Singh, Tuomas Virtanen, Bhiksha Raj
Parametric Time-Frequency Domain Spatial Audio

Parametric Time-Frequency Domain Spatial Audio

Ville Pulkki, Symeon Delikaris-Manias, Archontis Politis

Publisher Resources

ISBN: 9781119279891Purchase book