Skip to Content
R for Data Science, 2nd Edition
book

R for Data Science, 2nd Edition

by Hadley Wickham, Mine Çetinkaya-Rundel, Garrett Grolemund
June 2023
Beginner to intermediate
576 pages
12h 57m
English
O'Reilly Media, Inc.
Content preview from R for Data Science, 2nd Edition

Part IV. Import

In this part of the book, you’ll learn how to import a wider range of data into R, as well as how to get it into a form useful form for analysis. Sometimes this is just a matter of calling a function from the appropriate data import package. But in more complex cases it might require both tidying and transformation to get to the tidy rectangle that you’d prefer to work with.

Our data science model with import highlighted in blue.
Figure IV-1. Data import is the beginning of the data science process; without data you can’t do data science!

In this part of the book you’ll learn how to access data stored in the following ways:

  • In Chapter 20, you’ll learn how to import data from Excel spreadsheets and Google Sheets.

  • In Chapter 21, you’ll learn about getting data out of a database and into R (and you’ll also learn a little about how to get data out of R and into a database).

  • In Chapter 22, you’ll learn about Arrow, a powerful tool for working with out-of-memory data, particularly when it’s stored in the parquet format.

  • In Chapter 23, you’ll learn how to work with hierarchical data, including the deeply nested lists produced by data stored in the JSON format.

  • In Chapter 24, you’ll learn web “scraping,” the art and science of extracting data from web pages.

There are two important tidyverse packages that we don’t discuss here: haven and xml2. If you are working with data from SPSS, Stata, and SAS files, check out the haven ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

R for Data Science

R for Data Science

Hadley Wickham, Garrett Grolemund

Publisher Resources

ISBN: 9781492097396Errata Page