Chapter 27. Dealing with Nulls

Nulls, or the absence of data, are a fickle beast within data preparation. Experienced data preppers will know almost instinctively how to deal with them, or at least how to manage the challenges that come with a data set containing them. Novice data preppers do not have the same set of use cases or experience to draw on, though, so this chapter covers the basic considerations for working with a data set containing nulls.

 

What Is a Null?

The absence of data is not the same as a zero, a new row, or a space, all of which are actually values. Nulls appear in data sets for many reasons, including:

  • They are the result of mismatched fields in a union.

  • They are the result of mismatched fields in a left, right, or fullOuter join.

  • There’s no original data entry for that record, but other data points in the set exist (i.e., the other fields are not null).

Prep Builder shows the number of null records for any data field in its Profile pane (Figure 27-1).

Nulls in the Volume data field
Figure 27-1. Nulls in the Volume data field

Now that you know what they are and where to find them, let’s look at when it’s acceptable to have null records in your data set and when you should remove or replace them.

When Is a Null OK?

To understand when a null value is acceptable, let’s assess the impact of one. The most common situation where nulls can affect your analysis is when you are averaging ...

Get Tableau Prep: Up & Running now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.