Skip to Content
R for Data Science, 2nd Edition
book

R for Data Science, 2nd Edition

by Hadley Wickham, Mine Çetinkaya-Rundel, Garrett Grolemund
June 2023
Beginner to intermediate
576 pages
12h 57m
English
O'Reilly Media, Inc.
Book available
Content preview from R for Data Science, 2nd Edition

Chapter 3. Data Transformation

Introduction

Visualization is an important tool for generating insight, but it’s rare that you get the data in exactly the right form you need to make the graph you want. Often you’ll need to create some new variables or summaries to answer your questions with your data, or maybe you just want to rename the variables or reorder the observations to make the data a little easier to work with. You’ll learn how to do all that (and more!) in this chapter, which will introduce you to data transformation using the dplyr package and a new dataset on flights that departed New York City in 2013.

The goal of this chapter is to give you an overview of all the key tools for transforming a data frame. We’ll start with functions that operate on rows and then columns of a data frame, and then we’ll circle back to talk more about the pipe, an important tool that you use to combine verbs. We will then introduce the ability to work with groups. We will end the chapter with a case study that showcases these functions in action, and we’ll come back to the functions in more detail in later chapters, as we start to dig into specific types of data (e.g., numbers, strings, dates).

Prerequisites

In this chapter we’ll focus on the dplyr package, another core member of the tidyverse. We’ll illustrate the key ideas using data from the nycflights13 package and use ggplot2 to help us understand the data.

library(nycflights13)
library(tidyverse)
#> ── Attaching core tidyverse ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

R Programming for Statistics and Data Science

R Programming for Statistics and Data Science

365 Careers Ltd.
R for Data Science

R for Data Science

Hadley Wickham, Garrett Grolemund

Publisher Resources

ISBN: 9781492097396Errata PageSupplemental Content