5.4 Parametric stereo encoder
5.4.1 Time/frequency decomposition
The encoder receives a stereo input signal pair x1(n), x2(n) with a sampling rate fs. These input signals are decomposed in time/frequency tiles either using a STFT or by applying a filterbank. When using a STFT, time-domain segmentation and windowing is typically applied prior to transformation to the frequency domain. When a filterbank is applied, windowing and segmentation can be applied in the filterbank domain as well. If the input signal does not contain strong transients, the frame length (or parameter update rate) should match the lower bound of the measured time constants of the binaural auditory system (i.e., between 23 and 100 ms). Dynamic window switching is preferably used in the case of transients. The purpose of window switching is twofold. Firstly, to account for the precedence effect, which dictates that only the first 2 ms of a transient in a reverberant environment determine its perceived location. Secondly, to prevent ringing artifacts resulting from the frequency-dependent processing which is applied in otherwise relatively long segments. The window switching procedure, of which the essence is demonstrated in Figure 5.2, is controlled automatically by a transient detector.
If a transient is detected at a certain temporal position, a stop window of variable length is applied which just stops before the transient. The transient itself is captured using a very short window (of the order of a few ...
Get Spatial Audio Processing: MPEG Surround and Other Applications now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.