Implementation and Analysis of LZ77
With LZ77, we try to compress data by encoding phrases from a look-ahead buffer as tokens referencing phrases in a sliding window. To uncompress the data, we decode each token into the phrase or symbol it represents. To do this, we must continually update the sliding window so that at any one time it looks the same as it did during the compression process. In the implementation presented here, a symbol in the original data is one byte.
lz77_compress
The lz77_compress operation (see Example 14.5) compresses data using LZ77. It begins by writing the number of symbols in the data to the buffer of compressed data and initializing the sliding window and look-ahead buffer. The look-ahead buffer is then loaded with symbols.
Compression takes place inside of a loop that iterates until
there are no more symbols to process. We use
ipos
to keep track of the current byte
being processed in the original data, and
opos
to keep track of the current bit we
are writing to the buffer of compressed data. During each iteration
of the loop, we call compare_win to determine
the longest phrase in the look-ahead buffer that matches one in the
sliding window. The compare_win function
returns the length of the longest match.
When a match is found,
compare_win sets
offset
to the position of the match in
the sliding window and next
to the symbol
in the look-ahead buffer immediately after the match. In this case,
we write a phrase token to the compressed data (see Figure 14.4 ...
Get Mastering Algorithms with C now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.