Chapter 9. Tidy Data with tidyr

Introduction

Happy families are all alike; every unhappy family is unhappy in its own way.

Leo Tolstoy

Tidy datasets are all alike, but every messy dataset is messy in its own way.

Hadley Wickham

In this chapter, you will learn a consistent way to organize your data in R, an organization called tidy data. Getting your data into this format requires some up-front work, but that work pays off in the long term. Once you have tidy data and the tidy tools provided by packages in the tidyverse, you will spend much less time munging data from one representation to another, allowing you to spend more time on the analytic questions at hand.

This chapter will give you a practical introduction to tidy data and the accompanying tools in the tidyr package. If you’d like to learn more about the underlying theory, you might enjoy the Tidy Data paper published in the Journal of Statistical Software.

Prerequisites

In this chapter we’ll focus on tidyr, a package that provides a bunch of tools to help tidy up your messy datasets. tidyr is a member of the core tidyverse.

library(tidyverse)

Tidy Data

You can represent the same underlying data in multiple ways. The following example shows the same data organized in four different ways. Each dataset shows the same values of four variables, country, year, population, and cases, but each dataset organizes the values in a different way:

table1
#> # A tibble: 6 × 4
#>       country  year  cases population
#> <chr> <int> ...

Get R for Data Science now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.