Chapter 2. Do Not Skip This Chapter

Even if you are familiar with binary numbers, DO NOT SKIP THIS CHAPTER. We are going to begin digging into information theory as well, which is required for understanding the rest of this book.

Understanding Binary

It might seem a bit odd to start a book about data compression with a primer on binary numbers. Bear with us here. Everything in data compression is about reducing the number of bits used to represent a given data set. To expand on this concept, and the ramifications of its mathematics, let’s just take a second and make sure everyone is on the same page.

Base 10 System

Modern human mathematics is built around the decimal—base 10—number system.1

This system makes it possible for us to use the digits [0,1,2,3,4,5,6,7,8,9] strung together to represent number values. Back in elementary school, you might have been exposed to the concept of numeric columns, where, for example, the value 193 is split into three columns of hundreds, tens, and ones.

Hundreds Tens Ones

1

9

3

Effectively, 193 is equivalent to 1 * 100 + 9 * 10 + 3. And as soon as you grasped that pattern, maybe you realized that you could count to any number.

Later, when you learned about exponents, you were able to replace the “hundreds” and “tens” with their “base ten to the power” equivalents, and a new pattern emerged.

102

101

100

1

9

3

So:

193 = 1 * 100 + 9 * 10 + 3 = (1 * 102) + (9 * 101) + (3 * 100)

Because each column can contain ...

Get Understanding Compression now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.