Implementation and Analysis of Huffman Coding
With Huffman coding, we try to compress data by encoding symbols as Huffman codes generated in a Huffman tree. To uncompress the data, we rebuild the Huffman tree used in the compression process and convert each code back to the symbol it represents. In the implementation presented here, a symbol in the original data is one byte.
huffman_compress
The huffman_compress operation
(see Example 14.3) compresses data
using Huffman coding. It begins by scanning the data to determine
the frequency of each symbol. The frequencies are placed in an
array, freqs
. After scanning the data,
the frequencies are scaled so that each can be represented in a
single byte. This is done by determining the maximum number of times
any symbol occurs in the data and adjusting the other frequencies
accordingly. Since symbols that do not occur in the data should be
the only ones with frequencies of 0, we perform a simple test to
ensure that any nonzero frequencies that scale to less than 1 end up
being set to 1 instead of 0.
Once we have determined and scaled the frequencies, we call
build_tree to build the Huffman tree. The
build_tree function begins by inserting into a
priority queue one binary tree for each symbol occurring at least
once in the data. Nodes in the trees are
HuffNode
structures (see Example 14.1). This structure consists of two members:
symbol
is a symbol from the data (used
only in leaf nodes), and freq
is a frequency. Each tree initially contains ...
Get Mastering Algorithms with C now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.