Chapter 4

Huffman coding

In this chapter, we formally study the Huffman coding algorithms and apply the theory in Chapter 2.

4.1 Static Huffman coding

Huffman coding is a successful compression method used originally for text compression. In any text, some characters occur far more frequently than others. For example, in English text, the letters E, A, O, T are normally used much more frequently than J, Q, X.

Huffman’s idea is, instead of using a fixed-length code such as 8 bit extended ASCII or DBCDIC for each symbol, to represent a frequently occurring character in a source with a shorter codeword and to represent a less frequently occurring one with a longer codeword. Hence the total number of bits of this representation is significantly ...

Get Fundamental Data Compression now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.