Compression has been a recurring theme in this book, for image formats (such as GIF and PNG), command-line compression tools (such as bzip2 and gzip), and HTTP compression through gzip and deflate. Despite this diversity in how compression is used, the majority of implementations actually employ the same underlying algorithms. This appendix provides a brief examination of the most common compression methods.
THE LZW FAMILY
The origins of the widely used LZW family of compression algorithms have their roots in two papers written by Abraham Lempel and Jacob Ziv in 1977 and 1978. These papers defined the LZ77 and LZ78 algorithms, two contrasting approaches to achieve compression by spotting repeating patterns.
LZ77 looks for patterns of repeating data in previously seen input and replaces the data with a reference to the first instance of the pattern. The reference takes the form of a length-offset pair, indicating the length of the pattern, and it’s offset (from the current position). Consider the following stream of data:
The second instance of ABCD can be replaced with a length-offset pair, like so:
Here, the number 4 indicates the length of the pattern, and the number 9 indicates the offset (nine characters back from the current position).
In many cases, it would be impractical for LZ77 to search the whole stream of previous data for matches. It could potentially be hundreds of megabytes, and the CPU and memory ...