When the first edition of this book was published five years ago, the phrase “data science” had only recently entered the popular lexicon. Today, the phrase is unavoidable if you’re involved with the sciences, journalism, or high-tech industries. Many interrelated developments have made this possible: there’s a general awareness that understanding quantitative data has tangible benefits; there are better and more widely available educational resources about how to do data science; and finally, the tools have evolved, becoming easier to use and get started with.
The goal of this book is to help you understand your data by visualizing it, and to help you convey that understanding to others. You can think of data analysis as the process of transforming raw data into ideas in somebody’s mind. One of the key techniques for doing this is to create visualizations of the data. Our brains have very highly developed visual pattern detection systems, and data visualizations are a way to efficiently use those visual systems to get quantitative information into a person’s mind.
Each recipe in this book lists a problem and a solution. In most cases, the solutions I offer aren’t the only way to do things in R, but they are, in my opinion, the best way. One of the reasons for R’s popularity is that there are many available add-on packages, each of which provides some functionality for R. There are many packages for visualizing data in R, but this book primarily uses ggplot2.
This book isn’t meant to be a comprehensive manual of all the different ways of creating data visualizations in R, but hopefully it will help you figure out how to make the graphics you have in mind. Or, if you’re not sure what you want to make, browsing its pages may give you some ideas about what’s possible.
This book is intended for readers who have at least a basic understanding of R. The recipes in this book will show you how to do specific tasks. I’ve tried to use examples that are simple, so that you can understand how they work and transfer the solutions over to your own problems.
Software and Platform Notes
Most of the recipes here use the ggplot2 plotting package, and the dplyr package for data wrangling. These packages are both part of the tidyverse, which is a collection of R packages that make it easier to work with data. Some of the recipes require the most recent version of ggplot2, 3.0.0, and this in turn requires a relatively recent version of R. You can always get the latest version of R from the main R project site, http://www.r-project.org.
You can use the recipes with just a surface understanding of ggplot2, but if you want a deeper understanding of how it works, see Appendix A.
Once you’ve installed R, you can install the necessary packages. In addition to the tidverse packages, you’ll also want to install the gcookbook package, which contains data sets for many of the examples in this book. You can install the tidyverse packages and the gcookbook package with:
You may be asked to choose a mirror site for CRAN, the Comprehensive R Archive Network. Any of the sites should work, but it’s a good idea to choose one close to you because it will likely be faster than one far away. Once you’ve installed the packages, load the ggplot2 and dplyr packages in each R session in which you want to use the recipes in this book:
The recipes in this book will assume that you’ve already loaded ggplot2 and dplyr, so they won’t show these lines of code.
If you see an error like this, it means that you forgot to load ggplot2:
Error: could not find function "ggplot"
The major platforms for R are macOS, Linux, and Windows, and all the recipes in this book should work on all of these platforms. There are some platform-specific differences when it comes to creating bitmap output files, and these differences are covered in Chapter 14.
Conventions Used in This Book
Indicates new terms, URLs, email addresses, filenames, and file extensions.
Used for program listings, as well as within paragraphs to refer to program elements such as variable or function names, databases, data types, environment variables, statements, and keywords.
Constant width bold
Shows commands or other text that should be typed literally by the user.
Constant width italic
Shows text that should be replaced with user-supplied values or by values determined by context.
This element signifies a tip or suggestion.
This element signifies a general note.
This element indicates a warning or caution.
For almost 40 years, O’Reilly Media has provided technology and business training, knowledge, and insight to help companies succeed.
Our unique network of experts and innovators share their knowledge and expertise through books, articles, and our online learning platform. O’Reilly’s online learning platform gives you on-demand access to live training courses, in-depth learning paths, interactive coding environments, and a vast collection of text and video from O’Reilly and 200+ other publishers. For more information, please visit http://oreilly.com.
How to Contact Us
- O’Reilly Media, Inc.
- 1005 Gravenstein Highway North
- Sebastopol, CA 95472
- 800-998-9938 (in the United States or Canada)
- 707-829-0515 (international or local)
- 707-829-0104 (fax)
We have a web page for this book, where we list errata, examples, and any additional information. You can access this page at http://bit.ly/r-graphics-cookbook-2e.
To comment or ask technical questions about this book, send email to firstname.lastname@example.org.
For news and more information about our books and courses, see our website at http://www.oreilly.com.
Find us on Facebook: http://facebook.com/oreilly
Follow us on Twitter: http://twitter.com/oreillymedia
Watch us on YouTube: http://www.youtube.com/oreillymedia
No book is the product of a single person. There are many people who have helped make this book possible. I’d like to thank the R community for creating R and for fostering a dynamic ecosystem around it. Thanks to Hadley Wickham and other members of the tidyverse team for creating the software that this book revolves around, and for opening up many opportunities for me to deepen my understanding of R, data analysis, and visualization. I’m grateful that my employer, RStudio, not only makes it possible for me to work with some of the leading lights in the R community, but also pays us to work on software that the entire R community benefits from.
Thanks to the technical reviewers for this book, and the first edition of it: Garrett Grolemund, Thomas Lin Pedersen, Paul Teetor, Hadley Wickham, Dennis Murphy, and Erik Iverson. Their depth of knowledge and attention to detail have resulted in a much better book. Thanks to Jen Wang for her help editing this edition of the book.
Finally, I would like to thank my wife, Sylia, for her support and understanding—and not just with regard to the book.