Chapter 4. Case Study: Creating a Package
Before describing more systematically the components that RStudio provides for development work in R (most importantly the source-code editor), we will pick up where we left off on our case study of analyzing the group behavior and individual movements of a colony of naked mole rats. Here, our goal is to illustrate one way to do package development with RStudio.
Imagine after a short time using RStudio for interactive use, that we
are pretty happy using the command line for short commands, but have learned
to really enjoy writing scripts in the Code
editor. Even 2- or 3-line commands are much
easier to debug when done in the editor. The directness of typing at the
command line isn’t lost, as our fingers are now trained to hit Ctrl+Enter to
send the current line or selection to the R interpreter—or even
Ctrl+Shift+Enter to send the entire buffer (Command, not Ctrl, for Mac
users). We never need to leave the keyboard unless we choose to.
Along the way, we have been able to create a large script file that we now want to share with a colleague.
How do we do this? There are many ways. We could just send along the entire script, or with just a bit more extra work, we could share our work through a version control system. In some cases, this approach might be the best thing to do, as then our colleague can do exactly what we have been doing. However, there are many situations where this isn’t so great. For example, perhaps this colleague doesn’t know R too well and she just wants to have some functions to use. Plus, she may want to have some documentation on how to actually use the functions. Besides, we might want to share our work much more widely. At times like this, R users start to think about packages.
Packages are how R users extend R in a structured, reusable way.
CRAN
houses over 3,000 of them, and many
more are scattered widely throughout the internet at R-specific repositories
like those hosted by the Bioconductor project or on r-forge
. Packages also appear on code-hosting
sites such as http://github.com or
http://code.google.com. However, we don’t need to get
packages from a website. We can start by creating our own
local packages to share with colleagues. Let’s see how,
taking advantage of the features of the code-editor component of
RStudio.
Creating Functions from Script Files
Currently, our script is one long set of commands that processes the data files and then makes a plot. We first want to turn some of the commands into functions. Functions make code reuse much more feasible. A basic pattern in R is to write functions for simple small tasks and then chain these tasks together using function composition. This is similar to the composition concept from mathematics, where we take the output from one function and use this as the input for another.
RStudio’s integrated Source
code
editor—where we wrote our script—makes working with functions quite easy.
We’ll illustrate some of the features here.
For our task, we have a script that does four things to process the data:
It reads in the data and does some data cleaning.
It creates
zoo
objects for each mole rat.It merges these into one large
zoo
object.It makes a plot.
This naturally lends itself to four functions. Keeping our functions small and focused on a single task makes them easier to test and debug. It can also help later on in the development of a package, when we may think about combining similar tasks into more general functions, although we won’t see that here.
RStudio provides a convenient action for turning a series of
commands into a function. The magic wand toolbar button in the code editor
has the Extract Function
action. We
simply highlight the text we want and then wave the magic wand—tada! In
Figures 4-1 and 4-2, we
illustrate the changes introduced by the magic wand. Our first function
will be the one that reads the data into a data frame where the time
column is using one of R’s date classes.
The magic wand does most of the work, but not all in this case, as the text can’t adequately be parsed. In R, functions have arguments, a body of commands, a return value, and optionally are assigned to a name. We specify the name in the extract-function dialog, but for this instance added the function argument “f” and the return value “x” after the extraction.
We don’t try to automate the process of converting the
rtf file into a txt file, as
that isn’t so easy. We will put together the commands to process the data
frame and create a list of zoo
objects
(one for each mole rat) and the commands to create a multivariate zoo
object. This will be done with the magic
wand in a similar manner as above.
A Package Skeleton
Packages must have a special structure, detailed in the
Writing R Extensions manual that accompanies a
standard R installation. We can consult that for detailed reference, but
for now all we need to know is that the function package.skeleton
will set up this structure for
us. (The ProjectTemplate
package can be used to provide even
more detail to this process.)
This function needs, at a minimum, just two things: where and what. As in, where are we going to write our package files and what will we initially populate them with? We choose the directory ~/NMRpackage, and will start with one of the functions from our script:
> setwd("~") > package.skeleton("NMRpackage", c("readNMRData")) Creating directories ... Creating DESCRIPTION ... Creating Read-and-delete-me ... Saving functions and data ... Making help files ... Done. Further steps are described in '/~NMRpackage/Read-and-delete-me'.
We now want to inform RStudio that we are working on a new project,
allowing us to compartmentalize our session data and accompanying
functions. A more detailed desciption of projects in RStudio is postponed
to Organizing Activities with Projects, for now we note that we used the
directory just created by package.skeleton
.
After creating a new project, we refresh the Files
browser to show us which files were
created (Figure 4-3).
We see two unexpected files in the base directory and two
subdirectories. We first investigate what is in the Read-and-delete-me
by clicking on the link and
reading. For now, nothing we need. It says to delete the file, so we
oblige by selecting the file’s checkbox and clicking the Delete toolbar
button.
The DESCRIPTION
file is used by R
to organize its packages. Ours needs to be updated to reflect our package.
Clicking the link opens the file in the code editor. Here we edit the
Title:
field and some others. Since our
package will rely on the zoo
and
ggplot2
packages, we add those to the
Depends
field. This file is in dcf
format with a keyword (the name before the
colon) and value on one line. If you need more lines for the value, just
give any additional lines some indented space, as was done for the
“Description:” line (see Figure 4-4).
The R directory is where all the programming is done. In this directory we have the files containing our function definitions. We change our working directory (Ctrl+Shift+K), and the file browser updates to show this directory.
We see that the call to package.skeleton
created a file named
readNMRData.R, containing the definition of the one
function we gave it. We could have one file per function, but that will
quickly get unwieldy. We could also put all our functions into one
file—but again that gets bulky. A better strategy is to group similar
functions together into a file. For now, we will create a file to hold our
data-processing functions (process.R), and another
file for our still-to-be-written visualization functions
(visualize.R).
To rename our file through the Files
browser, we select its checkbox and then
click the Rename
toolbar button. A
dialog prompts us for the new name. We open this file for editing by
clicking on its link. We then open our original script file
(one-big-script-file.R, which isn’t in our new
project) by using the Open File toolbar button on the application’s
toolbar. We then proceed to use the magic wand to create functions
createZooObjects
and createStateMatrix
. These are then
copy-and-pasted into the appropriate file in the R
directory.
RStudio has some facilities for navigating through a file full of functions. The “Go to file/function” search box in the main toolbar, allows one to quickly and conveniently navigate to all the functions in a project. For in-file navigation, in the lower-left corner of the code-editor component sits a label (Figure 4-5) that contains the line and column number, and next to that, a combobox that can be popped up to select a function to jump to.
We next open a new R Script
(Shift+Ctrl+N or through the File
menu)
for holding any functions for visualization and add a function to use
ggplot2
to make a graphic. We save the
file and update our Files
menu through
its Refresh
button.
Documenting Functions With roxygen2
The package.skeleton
command
makes the man
subdirectory. In R, all
exported functions must be documented in some file. Such files are written
using R’s Rd
markup language. Looking
at the man
directory, we see that two
files were made: readNMRData.Rd (a stub for our
function), and NMRpackage-package.Rd (a stub for
documenting the entire package). We open up the latter and make the
required changes—at a minimum, the lines that have paired double tildes
are edited to match our idea of the package.
We could go on to edit the readNMRData.Rd
template, but instead we will use the roxygen2
package to document our package’s
functions. Although R is organized around a workflow where one writes the
function then documents it separately (presumably after the function is
done), many other programming languages have facilities for writing in a
literate programming style using inline
documentation. Some R programmers are used to this functionality (it
simplifies iterative function writing and documenting) and the roxygen2
package makes this feasible. For the
modest demands of this package, it is an excellent choice.
Rd
format has a number of
required sections, and using roxygen2
does not eliminate the need for following that structure. All directives
appear in comments (we use ##'
).
Keywords are prefaced with an at symbol (@
). The main sections that are usually defined
are a title (taken from the first line), an optional description (taken
from the first paragraph), the function’s arguments (defined through the
@param
tags), a description of the
return value (@return
), whether the
function will be exported (@export
),
and, optionally some examples. R has tools to check that the right format
is followed. In particular, it will catch if you have failed to document
all the arguments or if you misname a section tag.
The Rd
markup is fairly
straightforward and is described in the Writing R
Extensions manual. An example of a documented function is shown
in Figure 4-6.
We can also create a NEWS
file to
keep track of changes between versions of the package. This may not be
useful for this package, but if the package proves popular with our
colleagues, a NEWS
file will help them
see what has happened with our package (Figure 4-7).
The NEWS
file is a plain-text file with
a simple structure. We open it through the File
> New
menu, but this time select Text File
. The code editor will present a
different toolbar in this case, as it makes no sense to be able to source
R code from this file.
The devtools Package
Testing a package can involve loading the package, testing it,
making desired changes, then reloading the package. This workflow can get
tedious—it can even involve closing and restarting R to flush residual
changes. The devtools
package is
designed to make this task easier.
If it isn’t installed, we can install it from CRAN using the
Packages
component (Figure 4-8). Click the Install
Packages
toolbar button and then type the desired package name
into the dialog. (An auto-complete feature makes this easy.) Leaving the
Install dependencies
option checked
will also install roxygen2
and the
testthat
package, if needed.
Rtools
Windows users will want to install RTools
, a set of development tools for Windows
maintained by D. Murdoch. The files are found at
http://cran.r-project.org/bin/windows/Rtools
.
The devtools
package provides the
load_all
function to reload the package
without having to restart R. To use it we define a package
variable (pkg
) pointing to the directory (an
“.Rpackages” file can be used to avoid this step),
then load the code (Figure 4-9). The new functions
do not appear in the Workspace
browser,
as they are stored in an environment just below the global workspace, but
they do show up through RStudio’s code-completion functionality.
We can try it out. In doing so, if we realize that we would like to
change some function, no problem. We make the adjustment in the code
editor, save the file, then reissue the command load_all(pkg)
.
For working with documentation, the devtools
package has the document
function (as in document(pkg)
) to call roxygen2
to create the corresponding Rd
files and show_news
to view the NEWS
file.
Package Data
We can add our testing commands in an example, but we will need to
have some data to use when we distribute our package. We wrote readNMRData
to accept any data file in the same
format, as we imagine our colleagues using it with other data sets
generated by the experiment. However, we can combine the data we have into
the package for testing and example purposes. R has the data
directory for including data in a package.
This data should be in a format R can easily read in—ours isn’t (it has a
different separator and we need to skip every other line). So instead, we
use the inst
directory to create our
own data directory. We call this sampledata
(not data
, as this would interfere with the data
directory that R processes automatically).
We create the needed directories with the New
Folder
toolbar button in the Files
browser.
How you get the package data file into this folder depends on how
you are using RStudio. If you are using the desktop version, you simply
copy the file over using your usual method (e.g., Finder, command line,
Windows Explorer). If you are using the server version, then this won’t
work. In that case, the Files
component
has an additional Upload
toolbar button
to allow you to upload your file. This button summons a dialog that allows
you to browse for a file or a zip archive of many
files (Figure 4-10).
Package Examples
R documentation files have the option of an “examples” section,
where one would usually see documentation of example usage of the
function(s). This is a very good idea, as it gives the user a sample
template to build on. In Figure 4-11, we see
sample code added to our readNMRData
function’s documentation.
For an installed package, examples can be run by the user through
the example
function. During
development with devtools
, the
programmer can use the run_examples
function.
Adding Tests
Although examples will typically be run during package development,
it is a good practice to include tests of the package’s core functions as
well. Tests serve a different purpose than examples. Well-designed tests
can help find bugs introduced by changes to functions—a not uncommon
event. The devtools
package can run
tests (through testthat
) that appear in
the inst/tests
subdirectory of the
package.
Building and Installing the Package
Packages can be checked for issues, built for distribution and installed for subsequent usage. RStudio does not have any features for performing such, but all can be done within devtools, or from a shell outside of the R process. For example, a UNIX or Mac OS X user could run:
> system("cd ~; R CMD build NMRpackage")
We could replace build
with
CHECK
to check our package for consistency with R’s
expectations. Though checking isn’t required for sharing a package with
colleagues, a package distributed on CRAN should pass the check phase
cleanly. Checking is a good thing in any case.
Installing locally built packages can be done from the
Install Packages dialog by selecting the option to
install from a Package Archive File (.tgz)
.
The devtools
package provides the functions
check
, build
, and
install
for performing these three basic tasks.
For Windows users, the WinBuilder project
(http://win-builder.R-project.org) is a web service
that can be used to create packages. Otherwise, building R packages under
Windows is made easier using the Rtools
bundle mentioned earlier.
Get Getting Started with RStudio now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.