Skip to Content
Audio Source Separation and Speech Enhancement
book

Audio Source Separation and Speech Enhancement

by Emmanuel Vincent, Tuomas Virtanen, Sharon Gannot
October 2018
Intermediate to advanced
504 pages
18h 50m
English
Wiley
Content preview from Audio Source Separation and Speech Enhancement

2Time‐Frequency Processing: Spectral Properties

Tuomas Virtanen Emmanuel Vincent and Sharon Gannot

Many audio signal processing algorithms typically do not operate on raw time‐domain audio signals, but rather on time‐frequency representations. A raw audio signal encodes the amplitude of a sound as a function of time. Its Fourier spectrum represents it as a function of frequency, but does not represent variations over time. A time‐frequency representation presents the amplitude of a sound as a function of both time and frequency, and is able to jointly account for its temporal and spectral characteristics (Gröchenig, 2001).

Time‐frequency representations are appropriate for three reasons in our context. First, separation and enhancement often require modeling the structure of sound sources. Natural sound sources have a prominent structure both in time and frequency, which can be easily modeled in the time‐frequency domain. Second, the sound sources are often mixed convolutively, and this convolutive mixing process can be approximated with simpler operations in the time‐frequency domain. Third, natural sounds are more sparsely distributed and overlap less with each other in the time‐frequency domain than in the time or frequency domain, which facilitates their separation.

In this chapter we introduce the most common time‐frequency representations used for source separation and speech enhancement. Section 2.1 describes the procedure for calculating a time‐frequency representation and ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Techniques for Noise Robustness in Automatic Speech Recognition

Techniques for Noise Robustness in Automatic Speech Recognition

Rita Singh, Tuomas Virtanen, Bhiksha Raj
Parametric Time-Frequency Domain Spatial Audio

Parametric Time-Frequency Domain Spatial Audio

Ville Pulkki, Symeon Delikaris-Manias, Archontis Politis

Publisher Resources

ISBN: 9781119279891Purchase book