Chapter 3. Data Transformation with dplyr

Introduction

Visualization is an important tool for insight generation, but it is rare that you get the data in exactly the right form you need. Often youâll need to create some new variables or summaries, or maybe you just want to rename the variables or reorder the observations in order to make the data a little easier to work with. Youâll learn how to do all that (and more!) in this chapter, which will teach you how to transform your data using the dplyr package and a new dataset on flights departing New York City in 2013.

Prerequisites

In this chapter weâre going to focus on how to use the dplyr package, another core member of the tidyverse. Weâll illustrate the key ideas using data from the nycflights13 package, and use ggplot2 to help us understand the data.

library(nycflights13)
library(tidyverse)

Take careful note of the conflicts message thatâs printed when you load the tidyverse. It tells you that dplyr overwrites some functions in base R. If you want to use the base version of these functions after loading dplyr, youâll need to use their full names: stats::filter() and stats::lag().

nycflights13

To explore the basic data manipulation verbs of dplyr, weâll use nycflights13::flights. This data frame contains all 336,776 flights that departed from New York City in 2013. The data comes from the US Bureau of Transportation Statistics, and is documented in ?flights:

flights
#> # A tibble: 336,776 Ã 19
#> year month ...

Get R for Data Science now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

R for Data Science by Hadley Wickham, Garrett Grolemund

Chapter 3. Data Transformation with dplyr

Introduction

Prerequisites

nycflights13

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly