Chapter 20. Spreadsheets

Introduction

In Chapter 7 you learned about importing data from plain-text files like .csv and .tsv. Now it’s time to learn how to get data out of a spreadsheet, either an Excel spreadsheet or a Google Sheet. This will build on much of what you’ve learned in Chapter 7, but we will also discuss additional considerations and complexities when working with data from spreadsheets.

If you or your collaborators are using spreadsheets for organizing data, we strongly recommend reading the paper “Data Organization in Spreadsheets” by Karl Broman and Kara Woo. The best practices presented in this paper will save you much headache when you import data from a spreadsheet into R to analyze and visualize.

Excel

Microsoft Excel is a widely used spreadsheet software program where data are organized in worksheets inside of spreadsheet files.

Prerequisites

In this section, you’ll learn how to load data from Excel spreadsheets in R with the readxl package. This package is noncore tidyverse, so you need to load it explicitly, but it is installed automatically when you install the tidyverse package. Later, we’ll also use the writexl package, which allows us to create Excel spreadsheets.

library(readxl)
library(tidyverse)
library(writexl)

Getting Started

Most of readxl’s functions allow you to load Excel spreadsheets into R:

Get R for Data Science, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.