Codecs are generally understood to be various mathematical models used to digitally encode (and compress) analog audio information. Many of these models take into account the human brain’s ability to form an impression from incomplete information. We’ve all seen optical illusions; likewise, voice-compression algorithms take advantage of our tendency to interpret what we believe we should hear, rather than what we actually hear.[109] The purpose of the various encoding algorithms is to strike a balance between efficiency and quality.[110]
Originally, the term codec referred to a COder/DECoder: a device that converts between analog and digital. Now, the term seems to relate more to COmpression/DECompression.
Before we dig into the individual codecs, take a look at Table 8-1—it’s a quick reference that you may want to refer back to.
Table 8-1. Codec quick reference
Codec | Data bitrate (Kbps) | License required? |
---|---|---|
G.711 | 64 Kbps | No |
G.726 | 16, 24, 32, or 40 Kbps | No |
G.729A | 8 Kbps | Yes (no for passthrough) |
GSM | 13 Kbps | No |
iLBC | 13.3 Kbps (30-ms frames) or 15.2 Kbps (20-ms frames) | No |
Speex | Variable (between 2.15 and 22.4 Kbps) | No |
G.711 is the fundamental codec of the PSTN. In fact, if someone refers to PCM (discussed in the previous chapter) with respect to a telephone network, you are allowed to think of G.711. Two companding methods are used: μlaw in North America and alaw in the rest of the world. Either one delivers an 8-bit word transmitted 8,000 times per second. If you do the math, you will see that this requires 64,000 bits to be transmitted per second.
Many people will tell you that G.711 is an uncompressed codec. This is not exactly true, as companding is considered a form of compression. What is true is that G.711 is the base codec from which all of the others are derived.
G.711 imposes minimal (almost zero) load on the CPU.
This codec has been around for some time (it used to be G.721, which is now obsolete), and it is one of the original compressed codecs. It is also known as Adaptive Differential Pulse-Code Modulation (ADPCM), and it can run at several bitrates. The most common rates are 16 Kbps, 24 Kbps, and 32 Kbps. As of this writing, Asterisk currently supports only the ADPCM-32 rate, which is far and away the most popular rate for this codec.
G.726 offers quality nearly identical to G.711, but it uses only half the bandwidth. This is possible because rather than sending the result of the quantization measurement, it sends only enough information to describe the difference between the current sample and the previous one. G.726 fell from favor in the 1990s due to its inability to carry modem and fax signals, but because of its bandwidth/CPU performance ratio it is now making a comeback. G.726 is especially attractive because it does not require a lot of computational work from the system.
Considering how little bandwidth it uses, G.729A delivers impressive sound quality. It does this through the use of Conjugate-Structure Algebraic-Code-Excited Linear Prediction (CS-ACELP).[111] Because of patents, you can’t use G729A without paying a licensing fee; however, it is extremely popular and is, thus, well supported on many different phones and systems.
To achieve its impressive compression ratio, this codec requires an equally impressive amount of effort from the CPU. In an Asterisk system, the use of heavily compressed codecs will quickly bog down the CPU.
G.729A uses 8 Kbps of bandwidth.
GSM is the darling codec of Asterisk. This codec does not come encumbered with a licensing requirement the way that G.729A does, and it offers outstanding performance with respect to the demand it places on the CPU. The sound quality is generally considered to be of a lesser grade than that produced by G.729A, but much of this comes down to personal opinion; be sure to try it out. GSM operates at 13 Kbps.
The Internet Low Bitrate Codec (iLBC) provides an attractive mix of low bandwidth usage and quality, and it is especially well suited to sustaining reasonable quality on lossy network links.
Naturally, Asterisk supports it (and support elsewhere is growing), but it is not as popular as the ITU codecs and, thus, may not be compatible with common IP telephones and commercial VoIP systems. IETF RFCs 3951 and 3952 have been published in support of iLBC, and iLBC is on the IETF standards track.
Because iLBC uses complex algorithms to achieve its high levels of compression, it has a fairly high CPU cost in Asterisk.
While you are allowed to use iLBC without paying royalty fees, the holder of the iLBC patent, Global IP Sound (GIPS), wants to know whenever you use it in a commercial application. The way you do that is by downloading and printing a copy of the iLBC license, signing it, and returning it to GIPS. If you want to read about iLBC and its license, you can do so at http://www.ilbcfreeware.org.
iLBC operates at 13.3 Kbps (30 ms frames) and 15.2 Kbps (20 ms frames).
Speex is a variable bitrate (VBR) codec, which means that it is able to dynamically modify its bitrate to respond to changing network conditions. It is offered in both narrowband and wideband versions, depending on whether you want telephone quality or better.
Speex is a totally free codec, licensed under the Xiph.org variant of the BSD license.
An Internet draft for Speex is available, and more information about Speex can be found at its home page (http://www.speex.org).
Speex can operate at anywhere from 2.15 to 22.4 Kbps, due to its variable bitrate.
Sure thing, MP3 is a codec. Specifically, it’s the Moving Picture Experts Group Audio Layer 3 Encoding Standard.[112] With a name like that, it’s no wonder we call it MP3! In Asterisk, the MP3 codec is typically used for Music on Hold (MoH). MP3 is not a telephony codec, as it is optimized for music, not voice; nevertheless, it’s very popular with VoIP telephony systems as a method of delivering Music on Hold.
[109] “Aoccdrnig to rsereach at an Elingsh uinervtisy, it deosn’t mttaer in waht oredr the ltteers in a wrod are, the olny iprmoatnt tihng is taht frist and lsat ltteres are in the rghit pclae. The rset can be a toatl mses and you can sitll raed it wouthit a porbelm. Tihs is bcuseae we do not raed ervey lteter by istlef, but the wrod as a wlohe.” (The source of this quote is unknown―see http://www.bisso.com/ujg_archives/000228.html.) We do the same thing with sound―if there is enough information, our brain can fill in the gaps.
[110] On an audio CD, quality is far more important than saving bandwidth, so the audio is quantized at 16 bits (times 2, as it’s stereo), with a sampling rate of 44,100 Hz. Considering that the CD was invented in the late 1970s, this was quite impressive stuff back then. The telephone network does not require this level of quality (and needs to optimize bandwidth), so telephone signals are encoded using 8 bits, at a sampling frequency of 8,000 Hz.
[111] CELP is a popular method of compressing speech. By mathematically modeling the various ways humans make sounds, a codebook of sounds can be built. Rather than sending an actual sampled sound, a code corresponding to the sound is determined. CELP codecs take this information (which by itself would produce a very robot-like sound) and attempt to add the personality back in. (Of course, there is much more to it than that.) Jason Woodward’s Speech Coding page (http://www-mobile.ecs.soton.ac.uk/speech_codecs/) is a source of helpful information for the non-mathematically inclined. This is fairly heavy stuff, though, so wear your thinking cap.
[112] If you want to learn all about MPEG audio, do a web search for Davis Pan’s paper titled “A Tutorial on MPEG/Audio Compression.”
Get Asterisk: The Future of Telephony, 2nd Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.