Chapter 6Getting Data into and out of R
Earlier chapters in this book have described what data in R looks like and how to manipulate it. In this chapter, we describe how to get data into R for analysis, and how to get updated data back out. Reading data into R is the first step in every data cleaning project, so it is in this chapter that the real work of data cleaning begins. But we start with a note on keeping track of your data's provenance, which is the word we use to describe the documentation of your data's history. You should know where you acquired every bit of your data, from what source, and on what date. A natural place to keep that information is in your scripts. Often, we have one or more scripts devoted to reading in the data, and these start with some description of the date, the source of the data, the commands to do the actual reading, and some notes on problems we encountered reading the data in.
Keeping track of the data's provenance is always important, but it is especially important if the underlying data is subject to change, perhaps because you extracted it from a database, a public site, or a web page under someone else's control. It is through this sort of documentation that you can make your research reproducible by others.
6.1 Reading Tabular ASCII Data into Data Frames
Most of the data we read – and write – in R comes to us in the form of rectangular or tabular data, that is, data arranged with observations in rows and measurements in columns. We ...
Get A Data Scientist's Guide to Acquiring, Cleaning, and Managing Data in R now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.