4.3 Binaural Cue Coding (BCC)

4.3.1 Time–frequency processing

BCC processes audio signals with a certain time and frequency resolution. The frequency resolution used is largely motivated by the frequency resolution of the auditory system (see Chapter 3). Psychoacoustics suggest that spatial perception is most likely based on a critical band representation of the acoustic input signal [26]. This frequency resolution is considered by using an invertible filterbank with sub-bands with bandwidths equal or proportional to the critical bandwidth of the auditory system [98, 293]. The specific time and frequency resolution used for BCC is discussed later in Section 4.3.3.

4.3.2 Down-mixing to one channel

It is important that the transmitted down-mix signal contains all signal components of the input audio signal. The goal is that each signal component is fully maintained. Simple summation of the audio input channels often results in amplification or attenuation of signal components. In other words, the power of signal components in the ‘simple’ sum is often larger or smaller than the sum of the power of the corresponding signal component of each channel. Therefore, a down-mixing technique is used which equalizes the down-mix signal such that the power of signal components in the down-mix signal is approximately the same as the corresponding power in all input channels.

Figure 4.2 shows the down-mixing scheme. The input audio channels xc(n) (1 ≤ c ≤ C) are decomposed into a number of sub-bands. ...

Get Spatial Audio Processing: MPEG Surround and Other Applications now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.