6. Tidy Data

6.1 Introduction

As mentioned in Chapter 4, Hadley Wickham,1 one of the more prominent members of the R community, introduced the concept of tidy data in a paper in the Journal of Statistical Software.2 Tidy data is a framework to structure data sets so they can be easily analyzed and visualized. It can be thought of as a goal one should aim for when cleaning data. Once you understand what tidy data is, that knowledge will make your data analysis, visualization, and collection much easier.

1. Hadley Wickham: http://hadley.nz/

2. Tidy data paper: http://vita.had.co.nz/papers/tidy-data.pdf

What is tidy data? Hadley Wickham’s paper defines it as meeting the following criteria:

■ Each row is an observation.

■ Each column is a variable. ...

Get Pandas for Everyone: Python Data Analysis, First Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.