library
(
tidyverse
)
Chapter 5. Data Tidying
Introduction
“Happy families are all alike; every unhappy family is unhappy in its own way.” —Leo Tolstoy
“Tidy datasets are all alike, but every messy dataset is messy in its own way.” —Hadley Wickham
In this chapter, you will learn a consistent way to organize your data in R using a system called tidy data. Getting your data into this format requires some work up front, but that work pays off in the long term. Once you have tidy data and the tidy tools provided by packages in the tidyverse, you will spend much less time munging data from one representation to another, allowing you to spend more time on the data questions you care about.
In this chapter, you’ll first learn the definition of tidy data and see it applied to a simple toy dataset. Then we’ll dive into the primary tool you’ll use for tidying data: pivoting. Pivoting allows you to change the form of your data without changing any of the values.
Prerequisites
In this chapter, we’ll focus on tidyr, a package that provides a bunch of tools to help tidy up your messy datasets. tidyr is a member of the core tidyverse.
From this chapter on, we’ll suppress the loading message from library(tidyverse)
.
Tidy Data
You can represent the same underlying data in multiple ways. The following example shows the same data organized in three different ways. Each dataset shows the same values of four variables: country, year, population, and number of documented cases of tuberculosis ...
Get R for Data Science, 2nd Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.