Chapter 21. Databases

Introduction

A huge amount of data lives in databases, so it’s essential that you know how to access it. Sometimes you can ask someone to download a snapshot into a .csv file for you, but this gets painful quickly: every time you need to make a change, you’ll have to communicate with another human. You want to be able to reach into the database directly to get the data you need, when you need it.

In this chapter, you’ll first learn the basics of the DBI package: how to use it to connect to a database and then retrieve data with a SQL1 query. SQL, short for Structured Query Language, is the lingua franca of databases and is an important language for all data scientists to learn. That said, we’re not going to start with SQL, but instead we’ll teach you dbplyr, which can translate your dplyr code to SQL. We’ll use that as a way to teach you some of the most important features of SQL. You won’t become a SQL master by the end of the chapter, but you will be able to identify the most important components and understand what they do.

Prerequisites

In this chapter, we’ll introduce DBI and dbplyr. DBI is a low-level interface that connects to databases and executes SQL; dbplyr is a high-level interface that translates your dplyr code to SQL queries and then executes them with DBI.

library(DBI)
library(dbplyr)
library(tidyverse)

Database Basics

At the simplest level, you can think about a database as a collection of data frames, called tables in database terminology. ...

Get R for Data Science, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.