Chapter 9. External Data

It’s often useful to include data in a package. If you’re releasing the package to a broad audience, it’s a way to provide compelling use cases for the package’s functions. If you’re releasing the package to a more specific audience, interested either in the data (e.g., NZ census data) or the subject (e.g., demography), it’s a way to distribute that data along with its documentation (as long as your audience is R users).

There are three main ways to include data in your package, depending on what you want to do with it and who should be able to use it:

If you want to store binary data and make it available to the user, put it in data/. This is the best place to put example datasets.
If you want to store parsed data, but not make it available to the user, put it in R/sysdata.rda. This is the best place to put data that your functions need.
If you want to store raw data, put it in inst/extdata.

A simple alternative to these three options is to include it in the source of your package, either creating by hand, or using dput() to serialize an existing dataset into R code.

Each possible location is described in more detail in the following sections.

Exported Data

The most common location for package data is (surprise!) data/. Each file in this directory should be an .RData file created by save() containing a single object (with the same name as the file). The easiest way to adhere to these rules is to use devtools::use_data():

 x <- sample(1000) devtools::use_data(x, ...

Get R Packages now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

R Packages by Hadley Wickham

Chapter 9. External Data

Exported Data

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly