Chapter 4. Case Study: Creating a Package
Before describing more systematically the components that RStudio provides for development work in R (most importantly the source-code editor), we will pick up where we left off on our case study of analyzing the group behavior and individual movements of a colony of naked mole rats. Here, our goal is to illustrate one way to do package development with RStudio.
Imagine after a short time using RStudio for interactive use, that we
are pretty happy using the command line for short commands, but have learned
to really enjoy writing scripts in the
Code editor. Even 2- or 3-line commands are much
easier to debug when done in the editor. The directness of typing at the
command line isn’t lost, as our fingers are now trained to hit Ctrl+Enter to
send the current line or selection to the R interpreter—or even
Ctrl+Shift+Enter to send the entire buffer (Command, not Ctrl, for Mac
users). We never need to leave the keyboard unless we choose to.
Along the way, we have been able to create a large script file that we now want to share with a colleague.
How do we do this? There are many ways. We could just send along the entire script, or with just a bit more extra work, we could share our work through a version control system. In some cases, this approach might be the best thing to do, as then our colleague can do exactly what we have been doing. However, there are many situations where this isn’t so great. For example, perhaps this colleague doesn’t know R too well and she just wants to have some functions to use. Plus, she may want to have some documentation on how to actually use the functions. Besides, we might want to share our work much more widely. At times like this, R users start to think about packages.
Packages are how R users extend R in a structured, reusable way.
CRAN houses over 3,000 of them, and many
more are scattered widely throughout the internet at R-specific repositories
like those hosted by the Bioconductor project or on
r-forge. Packages also appear on code-hosting
sites such as http://github.com or
http://code.google.com. However, we don’t need to get
packages from a website. We can start by creating our own
local packages to share with colleagues. Let’s see how,
taking advantage of the features of the code-editor component of
Creating Functions from Script Files
Currently, our script is one long set of commands that processes the data files and then makes a plot. We first want to turn some of the commands into functions. Functions make code reuse much more feasible. A basic pattern in R is to write functions for simple small tasks and then chain these tasks together using function composition. This is similar to the composition concept from mathematics, where we take the output from one function and use this as the input for another.
editor—where we wrote our script—makes working with functions quite easy.
We’ll illustrate some of the features here.
For our task, we have a script that does four things to process the data:
It reads in the data and does some data cleaning.
zooobjects for each mole rat.
It merges these into one large
It makes a plot.
This naturally lends itself to four functions. Keeping our functions small and focused on a single task makes them easier to test and debug. It can also help later on in the development of a package, when we may think about combining similar tasks into more general functions, although we won’t see that here.
RStudio provides a convenient action for turning a series of
commands into a function. The magic wand toolbar button in the code editor
Extract Function action. We
simply highlight the text we want and then wave the magic wand—tada! In
Figures 4-1 and 4-2, we
illustrate the changes introduced by the magic wand. Our first function
will be the one that reads the data into a data frame where the time
column is using one of R’s date classes.
The magic wand does most of the work, but not all in this case, as the text can’t adequately be parsed. In R, functions have arguments, a body of commands, a return value, and optionally are assigned to a name. We specify the name in the extract-function dialog, but for this instance added the function argument “f” and the return value “x” after the extraction.
We don’t try to automate the process of converting the
rtf file into a txt file, as
that isn’t so easy. We will put together the commands to process the data
frame and create a list of
(one for each mole rat) and the commands to create a multivariate
zoo object. This will be done with the magic
wand in a similar manner as above.
A Package Skeleton
Packages must have a special structure, detailed in the
Writing R Extensions manual that accompanies a
standard R installation. We can consult that for detailed reference, but
for now all we need to know is that the function
package.skeleton will set up this structure for
ProjectTemplate package can be used to provide even
more detail to this process.)
This function needs, at a minimum, just two things: where and what. As in, where are we going to write our package files and what will we initially populate them with? We choose the directory ~/NMRpackage, and will start with one of the functions from our script:
> setwd("~") > package.skeleton("NMRpackage", c("readNMRData")) Creating directories ... Creating DESCRIPTION ... Creating Read-and-delete-me ... Saving functions and data ... Making help files ... Done. Further steps are described in '/~NMRpackage/Read-and-delete-me'.
We now want to inform RStudio that we are working on a new project,
allowing us to compartmentalize our session data and accompanying
functions. A more detailed desciption of projects in RStudio is postponed
to Organizing Activities with Projects, for now we note that we used the
directory just created by
After creating a new project, we refresh the
Files browser to show us which files were
created (Figure 4-3).
We see two unexpected files in the base directory and two
subdirectories. We first investigate what is in the
Read-and-delete-me by clicking on the link and
reading. For now, nothing we need. It says to delete the file, so we
oblige by selecting the file’s checkbox and clicking the Delete toolbar
DESCRIPTION file is used by R
to organize its packages. Ours needs to be updated to reflect our package.
Clicking the link opens the file in the code editor. Here we edit the
Title: field and some others. Since our
package will rely on the
ggplot2 packages, we add those to the
Depends field. This file is in
dcf format with a keyword (the name before the
colon) and value on one line. If you need more lines for the value, just
give any additional lines some indented space, as was done for the
“Description:” line (see Figure 4-4).
The R directory is where all the programming is done. In this directory we have the files containing our function definitions. We change our working directory (Ctrl+Shift+K), and the file browser updates to show this directory.
We see that the call to
package.skeleton created a file named
readNMRData.R, containing the definition of the one
function we gave it. We could have one file per function, but that will
quickly get unwieldy. We could also put all our functions into one
file—but again that gets bulky. A better strategy is to group similar
functions together into a file. For now, we will create a file to hold our
data-processing functions (process.R), and another
file for our still-to-be-written visualization functions
To rename our file through the
Files browser, we select its checkbox and then
Rename toolbar button. A
dialog prompts us for the new name. We open this file for editing by
clicking on its link. We then open our original script file
(one-big-script-file.R, which isn’t in our new
project) by using the Open File toolbar button on the application’s
toolbar. We then proceed to use the magic wand to create functions
createStateMatrix. These are then
copy-and-pasted into the appropriate file in the R
RStudio has some facilities for navigating through a file full of functions. The “Go to file/function” search box in the main toolbar, allows one to quickly and conveniently navigate to all the functions in a project. For in-file navigation, in the lower-left corner of the code-editor component sits a label (Figure 4-5) that contains the line and column number, and next to that, a combobox that can be popped up to select a function to jump to.
We next open a new
(Shift+Ctrl+N or through the
for holding any functions for visualization and add a function to use
ggplot2 to make a graphic. We save the
file and update our
Files menu through
Documenting Functions With roxygen2
man subdirectory. In R, all
exported functions must be documented in some file. Such files are written
Rd markup language. Looking
man directory, we see that two
files were made: readNMRData.Rd (a stub for our
function), and NMRpackage-package.Rd (a stub for
documenting the entire package). We open up the latter and make the
required changes—at a minimum, the lines that have paired double tildes
are edited to match our idea of the package.
We could go on to edit the readNMRData.Rd
template, but instead we will use the
roxygen2 package to document our package’s
functions. Although R is organized around a workflow where one writes the
function then documents it separately (presumably after the function is
done), many other programming languages have facilities for writing in a
literate programming style using inline
documentation. Some R programmers are used to this functionality (it
simplifies iterative function writing and documenting) and the
roxygen2 package makes this feasible. For the
modest demands of this package, it is an excellent choice.
Rd format has a number of
required sections, and using
does not eliminate the need for following that structure. All directives
appear in comments (we use
Keywords are prefaced with an at symbol (
@). The main sections that are usually defined
are a title (taken from the first line), an optional description (taken
from the first paragraph), the function’s arguments (defined through the
@param tags), a description of the
return value (
@return), whether the
function will be exported (
and, optionally some examples. R has tools to check that the right format
is followed. In particular, it will catch if you have failed to document
all the arguments or if you misname a section tag.
Rd markup is fairly
straightforward and is described in the Writing R
Extensions manual. An example of a documented function is shown
in Figure 4-6.
We can also create a
NEWS file to
keep track of changes between versions of the package. This may not be
useful for this package, but if the package proves popular with our
NEWS file will help them
see what has happened with our package (Figure 4-7).
NEWS file is a plain-text file with
a simple structure. We open it through the
> New menu, but this time select
Text File. The code editor will present a
different toolbar in this case, as it makes no sense to be able to source
R code from this file.
The devtools Package
Testing a package can involve loading the package, testing it,
making desired changes, then reloading the package. This workflow can get
tedious—it can even involve closing and restarting R to flush residual
devtools package is
designed to make this task easier.
If it isn’t installed, we can install it from CRAN using the
Packages component (Figure 4-8). Click the
Packages toolbar button and then type the desired package name
into the dialog. (An auto-complete feature makes this easy.) Leaving the
Install dependencies option checked
will also install
roxygen2 and the
testthat package, if needed.
Windows users will want to install
RTools, a set of development tools for Windows
maintained by D. Murdoch. The files are found at
devtools package provides the
load_all function to reload the package
without having to restart R. To use it we define a
package variable (
pkg) pointing to the directory (an
“.Rpackages” file can be used to avoid this step),
then load the code (Figure 4-9). The new functions
do not appear in the
as they are stored in an environment just below the global workspace, but
they do show up through RStudio’s code-completion functionality.
We can try it out. In doing so, if we realize that we would like to
change some function, no problem. We make the adjustment in the code
editor, save the file, then reissue the command
For working with documentation, the
devtools package has the
document function (as in
document(pkg)) to call
roxygen2 to create the corresponding
Rd files and
show_news to view the
We can add our testing commands in an example, but we will need to
have some data to use when we distribute our package. We wrote
readNMRData to accept any data file in the same
format, as we imagine our colleagues using it with other data sets
generated by the experiment. However, we can combine the data we have into
the package for testing and example purposes. R has the
data directory for including data in a package.
This data should be in a format R can easily read in—ours isn’t (it has a
different separator and we need to skip every other line). So instead, we
inst directory to create our
own data directory. We call this
data, as this would interfere with the
data directory that R processes automatically).
We create the needed directories with the
Folder toolbar button in the
How you get the package data file into this folder depends on how
you are using RStudio. If you are using the desktop version, you simply
copy the file over using your usual method (e.g., Finder, command line,
Windows Explorer). If you are using the server version, then this won’t
work. In that case, the
has an additional
Upload toolbar button
to allow you to upload your file. This button summons a dialog that allows
you to browse for a file or a zip archive of many
files (Figure 4-10).
R documentation files have the option of an “examples” section,
where one would usually see documentation of example usage of the
function(s). This is a very good idea, as it gives the user a sample
template to build on. In Figure 4-11, we see
sample code added to our
For an installed package, examples can be run by the user through
example function. During
programmer can use the
Although examples will typically be run during package development,
it is a good practice to include tests of the package’s core functions as
well. Tests serve a different purpose than examples. Well-designed tests
can help find bugs introduced by changes to functions—a not uncommon
devtools package can run
testthat) that appear in
inst/tests subdirectory of the
Building and Installing the Package
Packages can be checked for issues, built for distribution and installed for subsequent usage. RStudio does not have any features for performing such, but all can be done within devtools, or from a shell outside of the R process. For example, a UNIX or Mac OS X user could run:
> system("cd ~; R CMD build NMRpackage")
We could replace
CHECK to check our package for consistency with R’s
expectations. Though checking isn’t required for sharing a package with
colleagues, a package distributed on CRAN should pass the check phase
cleanly. Checking is a good thing in any case.
Installing locally built packages can be done from the
Install Packages dialog by selecting the option to
install from a
Package Archive File (.tgz).
devtools package provides the functions
install for performing these three basic tasks.
For Windows users, the WinBuilder project
(http://win-builder.R-project.org) is a web service
that can be used to create packages. Otherwise, building R packages under
Windows is made easier using the
bundle mentioned earlier.