Skip to Content
Data Wrangling with Python
book

Data Wrangling with Python

by Jacqueline Kazil, Katharine Jarmul
February 2016
Beginner to intermediate
508 pages
12h 27m
English
O'Reilly Media, Inc.
Content preview from Data Wrangling with Python

Chapter 8. Data Cleanup: Standardizing and Scripting

You’ve learned how to match, parse, and find duplicates in your data, and you’ve started exploring the wonderful world of data cleanup. As you grow to understand your datasets and the questions you’d like to answer with them, you’ll want to think about standardizing your data as well as automating your cleanup.

In this chapter, we’ll explore how and when to standardize your data and when to test and script your data cleanup. If you are managing regular updates or additions to the dataset, you’ll want to make the cleanup process as efficient and clear as possible so you can spend more time analyzing and reporting. We’ll begin by standardizing and normalizing your dataset and determining what to do if your dataset is not normalized.

Normalizing and Standardizing Your Data

Depending on your data and the type of research you are conducting, standardizing and normalizing your dataset might mean calculating new values using the values you currently have, or it might mean applying standardizations or normalizations across a particular column or value.

Normalization, from a statistical view, often has to do with calculating new values from a dataset to standardize the data on a particular scale. For example, you might need to normalize scores for a test to scale so you can accurately view the distribution. You might also need to normalize data so you can accurately see percentiles, or percentiles across different groups (or cohorts). ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Data Wrangling with Python

Data Wrangling with Python

Dr. Tirthajyoti Sarkar, Shubhadeep Roychowdhury
Python for Data Analytics

Python for Data Analytics

O'Reilly Media, Inc.

Publisher Resources

ISBN: 9781491948804Errata PageSupplemental Content