7

Data Normalization

The final point in the original “Tidy Data” paper stated that for data to be tidy “… each type of observational unit forms a table.” However, usually we need to combine multiple data sets together so we can do an analysis (Chapter 6). But when we think about how to store and manage data in a way where we reduce the amount of duplication and potential for errors, we should try to normalize our data into separate tables so a single fix can propagate when we combine the data together again.

Learning Objectives

  • Identify the differences between tidy data and data normalization

  • Apply data subsetting to split data into normalized parts

7.1 Multiple Observational Units in a Table (Normalization)

One of the simplest ways of ...

Get Pandas for Everyone: Python Data Analysis, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.