It’s often useful to include data in a package. If you’re releasing the package to a broad audience, it’s a way to provide compelling use cases for the package’s functions. If you’re releasing the package to a more specific audience, interested either in the data (e.g., NZ census data) or the subject (e.g., demography), it’s a way to distribute that data along with its documentation (as long as your audience is R users).
There are three main ways to include data in your package, depending on what you want to do with it and who should be able to use it:
If you want to store binary data and make it available to the user, put it in data/. This is the best place to put example datasets.
If you want to store parsed data, but not make it available to the user, put it in R/sysdata.rda. This is the best place to put data that your functions need.
If you want to store raw data, put it in inst/extdata.
A simple alternative to these three options is to include it in the source of your package, either creating by hand, or using
dput() to serialize an existing dataset into R code.
Each possible location is described in more detail in the following sections.
The most common location for package data is (surprise!) data/. Each file in this directory should be an .RData file created by
save() containing a single object (with the same name as the file). The easiest way to adhere to these rules is to use
x <- sample(1000) devtools::use_data(x, ...