## Description of Huffman Coding

One of the oldest and most elegant forms of data
compression is Huffman coding, an algorithm based on minimum
redundancy coding. Minimum redundancy coding suggests that if we know how often different symbols
occur in a set of data, we can represent the symbols in a way that
makes the data require less space. The idea is to encode symbols that
occur more frequently with fewer bits than those that occur less
frequently. It is important to realize that a symbol is not
necessarily a character of text: a symbol can be any amount of data we
choose, but it is often one byte’s worth.

### Entropy and Minimum Redundancy

To begin, let’s revisit the concept of entropy
introduced at the beginning of the chapter. Recall that every set of
data has some informational content, which is called its entropy.
The entropy of a set of data is the sum of the entropies of each of
its symbols. The entropy *S* of a symbol
*z* is defined as:

*S*_{z} =
-lg*P*_{z}

where *P*_{z}
is the probability of *z* being found in the
data. If it is known exactly how many times *z*
occurs, *P*_{z} is referred
to as the *frequency* of *z*.
As an example, if *z* occurs 8 times in 32
symbols, or one-fourth of the time, the entropy of
*z* is:

-lg(1/4) = 2 bits

This means that using any more than two bits to represent
*z* is more than we need. If we consider that
normally we represent a symbol using eight bits (one byte), we see
that compression here has the potential to improve the
representation a great deal.

Table 14.1 presents an example ...