In Chapters 32 to 34 we looked at a number of approaches to encoding speech signals with the goal of minimizing the size of the representation (i.e., the bitrate required for real-time transmission, or the number of bytes required for offline storage) while preserving the quality of the speech. These schemes started from the source-filter model of speech, then were able to exploit the constraints of that scheme to encode intelligible speech in drastically fewer bits than required for the original waveform. However, while fully intelligible, the reconstructed signals were usually easily distinguished from the originals.

In this chapter, we consider the situation where (a) we cannot assume that the source material is speech, or indeed any particular limited class of sound, and (b) our goal is a reconstructed signal that, for a normal listener, is indistinguishable from the original – that is, the coding scheme is ‘transparent’. The most prominent application of these techniques is for encoding music and other entertainment content such as the audio tracks of movies, and the best-known example of this family of coders is the MPEG-1 Audio layer 3 standard, better known as MP3. These schemes have been quite successful: taking an uncompressed CD audio stream as a starting point (16 bits per sample × 2 samples per frame ...

Get Speech and Audio Signal Processing: Processing and Perception of Speech and Music, Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.