7.2 Phase Vocoder Basics
The concepts of short-time Fourier analysis and synthesis have been widely described in the literature [Por76, Cro80, CR83]. We will briefly summarize the basics and define our notation of terms for application to digital audio effects.
The short-time Fourier transform (STFT) of the signal x(n) is given by
X(n, k) is a complex number and represents the magnitude |X(n, k)| and phase φ(n, k) of a time-varying spectrum with frequency bin (index) 0 ≤ k ≤ N − 1 and time index n. Note that the summation index is m in (7.1). At each time index n the signal x(m) is weighted by a finite length window h(n − m). Thus the computation of (7.1) can be performed by a finite sum over m with an FFT of length N. Figure 7.3 shows the input signal x(m) and the sliding window h(n − m) for three time indices of n. The middle plot shows the finite length windowed segments x(m) · h(n − m). These segments are transformed by the FFT, yielding the short-time spectra X(n, k) given by (7.1). The lower two rows in Figure 7.3 show the magnitude and phase spectra of the corresponding time segments.