2Time‐Frequency Processing: Spectral Properties

Tuomas Virtanen Emmanuel Vincent and Sharon Gannot

Many audio signal processing algorithms typically do not operate on raw time‐domain audio signals, but rather on time‐frequency representations. A raw audio signal encodes the amplitude of a sound as a function of time. Its Fourier spectrum represents it as a function of frequency, but does not represent variations over time. A time‐frequency representation presents the amplitude of a sound as a function of both time and frequency, and is able to jointly account for its temporal and spectral characteristics (Gröchenig, 2001).

Time‐frequency representations are appropriate for three reasons in our context. First, separation and enhancement often require modeling the structure of sound sources. Natural sound sources have a prominent structure both in time and frequency, which can be easily modeled in the time‐frequency domain. Second, the sound sources are often mixed convolutively, and this convolutive mixing process can be approximated with simpler operations in the time‐frequency domain. Third, natural sounds are more sparsely distributed and overlap less with each other in the time‐frequency domain than in the time or frequency domain, which facilitates their separation.

In this chapter we introduce the most common time‐frequency representations used for source separation and speech enhancement. Section 2.1 describes the procedure for calculating a time‐frequency representation and ...

Get Audio Source Separation and Speech Enhancement now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.