8Nonnegative Matrix Factorization

Roland Badeau and Tuomas Virtanen

Nonnegative matrix factorization (NMF) refers to a set of techniques that have been used to model the spectra of sound sources in various audio applications, including source separation. Sound sources have a structure in time and frequency: music consists of basic units like notes and chords played by different instruments, speech consists of elementary units such as phonemes, syllables or words, and environmental sounds consist of sound events produced by various sound sources. NMF models this structure by representing the spectra of sounds as a sum of components with fixed spectrum and time‐varying gain, so that each component in the model represents these elementary units in the sound.

Modeling this structure is beneficial in source separation, since inferring the structure makes it possible to use contextual information for source separation. NMF is typically used to model the magnitude or power spectrogram of audio signals, and its ability to represent the structure of audio sources makes separation possible even in single‐channel scenarios.

This chapter presents the use of NMF‐based single‐channel techniques. In Section 8.1 we introduce the basic NMF model used in various single‐channel source separation scenarios. In Section 8.2, several deterministic and probabilistic frameworks for NMF are presented, along with various NMF algorithms. Then several methods that can be used to learn NMF components by using suitable ...

Get Audio Source Separation and Speech Enhancement now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.