Chapter 9. External Data

It’s often useful to include data in a package.  If you’re releasing the package to a broad audience, it’s a way to provide compelling use cases for the package’s functions. If you’re releasing the package to a more specific audience, interested either in the data (e.g., NZ census data) or the subject (e.g., demography), it’s a way to distribute that data along with its documentation (as long as your audience is R users).

There are three main ways to include data in your package, depending on what you want to do with it and who should be able to use it:

  • If you want to store binary data and make it available to the user, put it in data/. This is the best place to put example datasets.

  • If you want to store parsed data, but not make it available to the user, put it in R/sysdata.rda. This is the best place to put data that your functions need.

  • If you want to store raw data, put it in inst/extdata.

A simple alternative to these three options is to include it in the source of your package, either creating by hand, or using dput() to serialize an existing dataset into R code.

Each possible location is described in more detail in the following sections.

Exported Data

The most common location for package data is (surprise!) data/. Each file in this directory should be an .RData file created by save() containing a single object (with the same name as the file). The easiest way to adhere to these rules is to use devtools::use_data():

 x <- sample(1000) devtools::use_data(x, ...

Get R Packages now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.