Chapter 15. Getting Your Data into Shape

When it comes to making data graphics, half the battle occurs before you call any plotting commands. Before you pass your data to the plotting functions, it must first be read in and given the correct structure. The data sets provided with R are ready to use, but when dealing with real-world data, this usually isn’t the case: you’ll have to clean up and restructure the data before you can visualize it.

The recipes in this chapter will often use packages from the tidyverse. For a little background about the tidyverse, see the introduction section of Chapter 1. I will also show how to do many of the same tasks using base R, because in some situations it is important to minimize the number of packages you use, and because it is useful to be able to understand code written for base R.


The %>% symbol, also known as the pipe operator, is used extensively in this chapter. If you are not familiar with it, see Recipe 1.7.

Most of the tidyverse functions used in this chapter are from the dplyr package, and in this chapter, I’ll assume that dplyr is already loaded. You can load it with either library(tidyverse) as shown above, or, if you want to keep things more streamlined, you can load dplyr directly:


Data sets in R are most often stored in data frames. They’re typically used as two-dimensional data structures, with each row representing one case and each column representing one variable. Data frames are essentially lists of ...

Get R Graphics Cookbook, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience live online training, plus books, videos, and digital content from nearly 200 publishers.