Skip to Content
Data Wrangling with Python
book

Data Wrangling with Python

by Jacqueline Kazil, Katharine Jarmul
February 2016
Beginner to intermediate
508 pages
12h 27m
English
O'Reilly Media, Inc.
Content preview from Data Wrangling with Python

Chapter 7. Data Cleanup: Investigation, Matching, and Formatting

Cleaning up your data is not the most glamourous of tasks, but it’s an essential part of data wrangling. Becoming a data cleaning expert requires precision and a healthy knowledge of your area of research or study. Knowing how to properly clean and assemble your data will set you miles apart from others in your field.

Python is well designed for data cleanup; it helps you build functions around patterns, eliminating repetitive work. As we’ve already seen in our code so far, learning to fix repetitive problems with scripts and code can turn hours of manual work into a script you run once.

In this chapter, we will take a look at how Python can help you clean and format your data. We’ll also use Python to locate duplicates and errors in our datasets. We will continue learning about cleanup, especially automating our cleanup and saving our cleaned data, in the next chapter.

Why Clean Data?

Some data may come to you properly formatted and ready to use. If this is the case, consider yourself lucky! Most data, even if it is cleaned, has some formatting inconsistencies or readability issues (e.g., acronyms or mismatched description headers). This is especially true if you are using data from more than one dataset. It’s unlikely your data will properly join and be useful unless you spend time formatting and standardizing it.

Note

Cleaning your data makes for easier storage, search, and reuse. As we explored in Chapter 6 ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Data Wrangling with Python

Data Wrangling with Python

Dr. Tirthajyoti Sarkar, Shubhadeep Roychowdhury
Python for Data Analytics

Python for Data Analytics

O'Reilly Media, Inc.

Publisher Resources

ISBN: 9781491948804Errata PageSupplemental Content