Skip to Content
R for Data Science, 2nd Edition
book

R for Data Science, 2nd Edition

by Hadley Wickham, Mine Çetinkaya-Rundel, Garrett Grolemund
June 2023
Beginner to intermediate
576 pages
12h 57m
English
O'Reilly Media, Inc.
Content preview from R for Data Science, 2nd Edition

Chapter 21. Databases

Introduction

A huge amount of data lives in databases, so it’s essential that you know how to access it. Sometimes you can ask someone to download a snapshot into a .csv file for you, but this gets painful quickly: every time you need to make a change, you’ll have to communicate with another human. You want to be able to reach into the database directly to get the data you need, when you need it.

In this chapter, you’ll first learn the basics of the DBI package: how to use it to connect to a database and then retrieve data with a SQL1 query. SQL, short for Structured Query Language, is the lingua franca of databases and is an important language for all data scientists to learn. That said, we’re not going to start with SQL, but instead we’ll teach you dbplyr, which can translate your dplyr code to SQL. We’ll use that as a way to teach you some of the most important features of SQL. You won’t become a SQL master by the end of the chapter, but you will be able to identify the most important components and understand what they do.

Prerequisites

In this chapter, we’ll introduce DBI and dbplyr. DBI is a low-level interface that connects to databases and executes SQL; dbplyr is a high-level interface that translates your dplyr code to SQL queries and then executes them with DBI.

library(DBI)
library(dbplyr)
library(tidyverse)

Database Basics

At the simplest level, you can think about a database as a collection of data frames, called tables in database terminology. ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

R for Data Science

R for Data Science

Hadley Wickham, Garrett Grolemund

Publisher Resources

ISBN: 9781492097396Errata Page