Chapter 7. Data Import

Introduction

Working with data provided by R packages is a great way to learn data science tools, but you want to apply what you’ve learned to your own data at some point. In this chapter, you’ll learn the basics of reading data files into R.

Specifically, this chapter will focus on reading plain-text rectangular files. We’ll start with practical advice for handling features such as column names, types, and missing data. You will then learn about reading data from multiple files at once and writing data from R to a file. Finally, you’ll learn how to handcraft data frames in R.

Prerequisites

In this chapter, you’ll learn how to load flat files in R with the readr package, which is part of the core tidyverse:

library(tidyverse)

Reading Data from a File

To begin, we’ll focus on the most common rectangular data file type: CSV, which is short for “comma-separated values.” Here is what a simple CSV file looks like. The first row, commonly called the header row, gives the column names, and the following six rows provide the data. The columns are separated, aka delimited, by commas.

Student ID,Full Name,favourite.food,mealPlan,AGE
1,Sunil Huffmann,Strawberry yoghurt,Lunch only,4
2,Barclay Lynn,French fries,Lunch only,5
3,Jayendra Lyne,N/A,Breakfast and lunch,7
4,Leon Rossini,Anchovies,Lunch only,
5,Chidiegwu Dunkel,Pizza,Breakfast and lunch,five
6,Güvenç Attila,Ice cream,Lunch only,6

Table 7-1 represents of the same data as a table.

Table 7-1. Data ...

Get R for Data Science, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.