Data compression is the process of reducing the number of
bits used to represent data. It is one of the most significant results
of *information theory*, an area of mathematics that addresses various ways to
manage and manipulate information. Data compression entails two
processes: in one process the data is compressed, or
*encoded*, to reduce its size; in a second process it
is uncompressed, or *decoded*, to return it to its
original state.

To understand why data compression is possible, we must first
understand that all data can be characterized by some informational
content, called its *entropy* (a term borrowed from thermodynamics). Compression is possible because most data is represented
with more bits than its entropy suggests is optimal. To gauge the
effectiveness of compression, we look at the ratio of the size of the
compressed data divided by its original size, and subtract this from 1.
This value is known as the data’s *compression ratio*
.

In the broadest sense, data compression methods are divided into
two classes: *lossy* and
*lossless*. In lossy compression we accept a certain loss of accuracy in
exchange for greater compression ratios. This is acceptable in some
applications, such as graphics and sound processing, provided the
degradation is managed carefully. However, frequently we use lossless
compression, which ensures that an exact copy of the original data is
reproduced when uncompressed.

This chapter focuses on lossless compression, for which ...

Start Free Trial

No credit card required