Skip to Content
Hands-On Data Visualization
book

Hands-On Data Visualization

by Jack Dougherty, Ilya Ilyankou
April 2021
Beginner to intermediate
471 pages
13h 16m
English
O'Reilly Media, Inc.
Book available
Content preview from Hands-On Data Visualization

Chapter 4. Clean Up Messy Data

More often than not, datasets will be messy and hard to visualize right away. They will have missing values, dates in different formats, text in numeric-only columns, multiple items in the same columns, various spellings of the same name, and other unexpected things. See Figure 4-1 for inspiration. Don’t be surprised if you find yourself spending more time cleaning up data than you do analyzing and visualizing it.

hodv 0401
Figure 4-1. More often than not, raw data looks messy.

In this chapter, you’ll learn about different tools to help you make decisions about which one to use to clean up your data efficiently. We’ll start with basic cleanup methods using Google Sheets in “Smart Cleanup with Google Sheets”, “Find and Replace with Blank”, “Transpose Rows and Columns”, “Split Data into Separate Columns”, and “Combine Data into One Column”. While we feature Google Sheets in our examples, many of these principles (and in some cases, the same formulas) apply to Microsoft Excel, LibreOffice Calc, Mac’s Numbers, or other spreadsheet packages. Next, you’ll learn how to extract table data from text-based PDF documents with Tabula, a free tool used by data journalists and researchers worldwide to analyze spending data, health reports, and all sorts of other datasets that get trapped in PDFs (see “Extract Tables from PDFs with Tabula”). Finally, we will introduce ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Fundamentals of Data Visualization

Fundamentals of Data Visualization

Claus O. Wilke
Hands-On Exploratory Data Analysis with Python

Hands-On Exploratory Data Analysis with Python

Suresh Kumar Mukhiya, Usman Ahmed

Publisher Resources

ISBN: 9781492085997Errata Page