Chapter 23. Filtering

One of the most important factors when cleaning data is deciding whether the data:

  • Can be cleaned up

  • Should be ignored

  • Should be removed

If you decide on the latter option, then you need to filter your data set. This sounds like a very easy decision to make but it shouldn’t be, especially if you are preparing data for others to use. Being certain that you or the end users won’t need this data going forward is difficult. If you are positive the data isn’t needed, don’t remove it except as the last step in the data prep process before publishing, after you’ve considered the following:

  • Does that data give the user context on other data points?

  • Is the data messy but manageable? Just because the data might be hard to tidy up doesn’t mean it couldn’t have value to end users.

  • If the business logic changes—that is, the user has a different business experience—will the data suddenly have meaning?

With those caveats in mind, let’s explore what a filter is and where to use one.

What Is a Filter?

A filter allows you to keep (a Keep Only filter) or remove (an Exclude filter) data from a data set. Once you have decided what you would like to keep or get rid of, you have several types of filters to choose from:

  • Selection

  • Calculation

  • Wildcard

  • Null values

There are also two different forms of filters that can be applied within each type:

  • Data field filters remove columns (fields) of data.

  • Data value filters remove rows of data.

Different ...

Get Tableau Prep: Up & Running now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.