Chapter 10. Dealing with Boolean Data

My favorite data joke is, “What’s a ghost’s favorite data type?”

“BOO-lean.”

Now that that’s out the way, let’s talk about potentially the simplest data type but one that sits at the heart of a lot of what we do in data analytics and therefore is an important part of data preparation: Boolean data. In this chapter, we’ll cover what it is and why it is useful when analyzing data.

What Is Boolean Data?

The word Boolean comes from the mathematician George Boole. He was Cork University’s first mathematics professor, and his theorems were eventually applied to computing. Boolean data is simply a True or False response to a conditional statement or test.

Why Is It So Useful in Data Analysis?

The response of True or False is often encoded as 1 or 0 behind the scenes in the software we use. Therefore, the performance of calculations that use Boolean data is very quick. Computing is based on 1s and 0s, so Boolean data is easily processed by a computer.

A simple column of 1 or 0 responses is actually amazingly useful in data analysis for many reasons beyond just performance:

Indicators

Here, an indicator refers to a field or set of fields that indicate whether or not each record fits some criteria. These can be analyzed very simply in most data tools. For example, if you want to count how many customers have a certain product type, it’s very simple to sum a Boolean indicator of 1s and 0s in most tools (Figure 10-1).

Figure 10-1. Indicators demonstrated ...

Get Tableau Prep: Up & Running now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.