With Huffman coding, we try to compress data by encoding symbols as Huffman codes generated in a Huffman tree. To uncompress the data, we rebuild the Huffman tree used in the compression process and convert each code back to the symbol it represents. In the implementation presented here, a symbol in the original data is one byte.
The huffman_compress operation
(see Example 14.3) compresses data
using Huffman coding. It begins by scanning the data to determine
the frequency of each symbol. The frequencies are placed in an
freqs. After scanning the data,
the frequencies are scaled so that each can be represented in a
single byte. This is done by determining the maximum number of times
any symbol occurs in the data and adjusting the other frequencies
accordingly. Since symbols that do not occur in the data should be
the only ones with frequencies of 0, we perform a simple test to
ensure that any nonzero frequencies that scale to less than 1 end up
being set to 1 instead of 0.
Once we have determined and scaled the frequencies, we call
build_tree to build the Huffman tree. The
build_tree function begins by inserting into a
priority queue one binary tree for each symbol occurring at least
once in the data. Nodes in the trees are
HuffNode structures (see Example 14.1). This structure consists of two members:
symbol is a symbol from the data (used
only in leaf nodes), and
freq is a frequency. Each tree initially contains ...