Speech signals are very complex with numerous characteristics. Here, we content ourselves with presenting a few properties that are significant in signal compression.

Figure 5.1 shows a speech signal, 0.5 s in length, from a female speaker in the time domain (Figure 5.1(a)) and in the frequency domain (Figure 5.1(b)).

The left-hand graph shows that the signal is obviously not stationary, but it can be considered to be locally stationary for periods of the order of a few dozen milliseconds. In speech coding, analysis frames of 20 ms are the standard choice.

Different sound types can be distinguished: voiced sounds, unvoiced sounds ^{1}, and plosive sounds. Compression of voiced and unvoiced sounds can be carried out in good conditions as we will see below. However, there is no equivalent for plosive sounds and for transitions between phonemes.

The third characteristic, which is very important as we will see, is the existence of a simple and an effective production model. Let us examine the graph in Figure 5.1(b), which gives two spectral estimates. The first estimate is realized by calculating a periodogram, that is, by taking the square of the modulus of the discrete Fourier transform with ** N** samples. This calculation yields a good approximation of the true, but inaccessible, power spectral density. This spectral density is known as . It is represented by a curve with numerous peaks. The second estimate uses ...

Start Free Trial

No credit card required