Chapter 6

Speech Coding

6.1. PCM and ADPCM coders

For telephone-band speech signals, there is a significant reduction in the band relative to that of natural speech. As a result, significant loss in quality occurs. A 16-bit quantization that ensures (along with a sampling frequency of 44.1 kHz) “compact disc” quality is therefore unnecessary. We can say that a 12-bit quantization is “homogeneous” with a sampling frequency of 8 kHz. The reference bit rate for telephone-band speech signals is therefore of the order of 100 kb/s.

The 64 kb/s PCM ITU-T G.711 coder realizes a non-uniform scalar quantization like that which was presented (very briefly) in section 1.2.3. A straightforward scalar quantization is not well adapted to signals which present significant instantaneous variations in power. We can show that to maintain a roughly constant signal-to-noise ratio, the adapted non-linear transform is logarithmic in nature. This transformation is generally approximated by straight segments. This is known as “A-law” in Europe and “μ-law” in the USA.

The operating principle of the 32 kbit/s ADPCM ITU-T G.726 coder shown in Figure 1.4 corresponds to a closed-loop predictive scalar quantizer. There are several possible linear prediction models. The G.726 coder uses an ARMA(2.6) model in which the two specified coefficients of the AR part and the six coefficients of the MA part are updated at the sample rate. To estimate these coefficients, a gradient method is used, but in reality this means ...

Get Tools for Signal Compression: Applications to Speech and Audio Coding now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.