With Huffman coding, we try to compress data by encoding symbols as Huffman codes generated in a Huffman tree. To uncompress the data, we rebuild the Huffman tree used in the compression process and convert each code back to the symbol it represents. In the implementation presented here, a symbol in the original data is one byte.

The *huffman_compress* operation
(see Example 14.3) compresses data
using Huffman coding. It begins by scanning the data to determine
the frequency of each symbol. The frequencies are placed in an
array, * freqs*. After scanning the data,
the frequencies are scaled so that each can be represented in a
single byte. This is done by determining the maximum number of times
any symbol occurs in the data and adjusting the other frequencies
accordingly. Since symbols that do not occur in the data should be
the only ones with frequencies of 0, we perform a simple test to
ensure that any nonzero frequencies that scale to less than 1 end up
being set to 1 instead of 0.

Once we have determined and scaled the frequencies, we call
*build_tree* to build the Huffman tree. The
*build_tree* function begins by inserting into a
priority queue one binary tree for each symbol occurring at least
once in the data. Nodes in the trees are
* HuffNode* structures (see Example 14.1). This structure consists of two members:

`symbol`

`freq`

Start Free Trial

No credit card required