Part IV. Import
In this part of the book, you’ll learn how to import a wider range of data into R, as well as how to get it into a form useful form for analysis. Sometimes this is just a matter of calling a function from the appropriate data import package. But in more complex cases it might require both tidying and transformation to get to the tidy rectangle that you’d prefer to work with.
Figure IV-1. Data import is the beginning of the data science process; without data you can’t do data science!
In this part of the book you’ll learn how to access data stored in the following ways:
In Chapter 20, you’ll learn how to import data from Excel spreadsheets and Google Sheets.
In Chapter 21, you’ll learn about getting data out of a database and into R (and you’ll also learn a little about how to get data out of R and into a database).
In Chapter 22, you’ll learn about Arrow, a powerful tool for working with out-of-memory data, particularly when it’s stored in the parquet format.
In Chapter 23, you’ll learn how to work with hierarchical data, including the deeply nested lists produced by data stored in the JSON format.
In Chapter 24, you’ll learn web “scraping,” the art and science of extracting data from web pages.
There are two important tidyverse packages that we don’t discuss here: haven and xml2. If you are working with data from SPSS, Stata, and SAS files, check out the haven ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access