Chapter 3. Data Transformation with dplyr
Introduction
Visualization is an important tool for insight generation, but it is rare that you get the data in exactly the right form you need. Often youâll need to create some new variables or summaries, or maybe you just want to rename the variables or reorder the observations in order to make the data a little easier to work with. Youâll learn how to do all that (and more!) in this chapter, which will teach you how to transform your data using the dplyr package and a new dataset on flights departing New York City in 2013.
Prerequisites
In this chapter weâre going to focus on how to use the dplyr package, another core member of the tidyverse. Weâll illustrate the key ideas using data from the nycflights13 package, and use ggplot2 to help us understand the data.
library
(
nycflights13
)
library
(
tidyverse
)
Take careful note of the conflicts message thatâs printed when you load
the tidyverse. It tells you that dplyr overwrites some functions in base
R. If you want to use the base version of these functions after loading
dplyr, youâll need to use their full names: stats::filter()
and
stats::lag()
.
nycflights13
To explore the basic data manipulation verbs of dplyr, weâll use
nycflights13::flights
. This data frame contains all 336,776 flights
that departed from New York City in 2013. The data comes from the US
Bureau of
Transportation Statistics, and is documented in ?flights
:
flights
#> # A tibble: 336,776 Ã 19
#> year month ...
Get R for Data Science now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.