Chapter 4. Case Study: Creating a Package

Before describing more systematically the components that RStudio provides for development work in R (most importantly the source-code editor), we will pick up where we left off on our case study of analyzing the group behavior and individual movements of a colony of naked mole rats. Here, our goal is to illustrate one way to do package development with RStudio.

Imagine after a short time using RStudio for interactive use, that we are pretty happy using the command line for short commands, but have learned to really enjoy writing scripts in the Code editor. Even 2- or 3-line commands are much easier to debug when done in the editor. The directness of typing at the command line isn’t lost, as our fingers are now trained to hit Ctrl+Enter to send the current line or selection to the R interpreter—or even Ctrl+Shift+Enter to send the entire buffer (Command, not Ctrl, for Mac users). We never need to leave the keyboard unless we choose to.

Along the way, we have been able to create a large script file that we now want to share with a colleague.

How do we do this? There are many ways. We could just send along the entire script, or with just a bit more extra work, we could share our work through a version control system. In some cases, this approach might be the best thing to do, as then our colleague can do exactly what we have been doing. However, there are many situations where this isn’t so great. For example, perhaps this colleague doesn’t know R too well and she just wants to have some functions to use. Plus, she may want to have some documentation on how to actually use the functions. Besides, we might want to share our work much more widely. At times like this, R users start to think about packages.

Packages are how R users extend R in a structured, reusable way. CRAN houses over 3,000 of them, and many more are scattered widely throughout the internet at R-specific repositories like those hosted by the Bioconductor project or on r-forge. Packages also appear on code-hosting sites such as http://github.com or http://code.google.com. However, we don’t need to get packages from a website. We can start by creating our own local packages to share with colleagues. Let’s see how, taking advantage of the features of the code-editor component of RStudio.

Creating Functions from Script Files

Currently, our script is one long set of commands that processes the data files and then makes a plot. We first want to turn some of the commands into functions. Functions make code reuse much more feasible. A basic pattern in R is to write functions for simple small tasks and then chain these tasks together using function composition. This is similar to the composition concept from mathematics, where we take the output from one function and use this as the input for another.

RStudio’s integrated Source code editor—where we wrote our script—makes working with functions quite easy. We’ll illustrate some of the features here.

For our task, we have a script that does four things to process the data:

  1. It reads in the data and does some data cleaning.

  2. It creates zoo objects for each mole rat.

  3. It merges these into one large zoo object.

  4. It makes a plot.

This naturally lends itself to four functions. Keeping our functions small and focused on a single task makes them easier to test and debug. It can also help later on in the development of a package, when we may think about combining similar tasks into more general functions, although we won’t see that here.

RStudio provides a convenient action for turning a series of commands into a function. The magic wand toolbar button in the code editor has the Extract Function action. We simply highlight the text we want and then wave the magic wand—tada! In Figures 4-1 and 4-2, we illustrate the changes introduced by the magic wand. Our first function will be the one that reads the data into a data frame where the time column is using one of R’s date classes.

Highlighting of the commands to be “wanded” into a function
Figure 4-1. Highlighting of the commands to be “wanded” into a function
A function generated by the magic wand, where the argument and return value was added by hand
Figure 4-2. A function generated by the magic wand, where the argument and return value was added by hand

The magic wand does most of the work, but not all in this case, as the text can’t adequately be parsed. In R, functions have arguments, a body of commands, a return value, and optionally are assigned to a name. We specify the name in the extract-function dialog, but for this instance added the function argument “f” and the return value “x” after the extraction.

We don’t try to automate the process of converting the rtf file into a txt file, as that isn’t so easy. We will put together the commands to process the data frame and create a list of zoo objects (one for each mole rat) and the commands to create a multivariate zoo object. This will be done with the magic wand in a similar manner as above.

A Package Skeleton

Packages must have a special structure, detailed in the Writing R Extensions manual that accompanies a standard R installation. We can consult that for detailed reference, but for now all we need to know is that the function package.skeleton will set up this structure for us. (The ProjectTemplate package can be used to provide even more detail to this process.)

This function needs, at a minimum, just two things: where and what. As in, where are we going to write our package files and what will we initially populate them with? We choose the directory ~/NMRpackage, and will start with one of the functions from our script:

> setwd("~")
> package.skeleton("NMRpackage", c("readNMRData"))
Creating directories ...
Creating DESCRIPTION ...
Creating Read-and-delete-me ...
Saving functions and data ...
Making help files ...
Done.
Further steps are described in '/~NMRpackage/Read-and-delete-me'.

We now want to inform RStudio that we are working on a new project, allowing us to compartmentalize our session data and accompanying functions. A more detailed desciption of projects in RStudio is postponed to Organizing Activities with Projects, for now we note that we used the directory just created by package.skeleton.

After creating a new project, we refresh the Files browser to show us which files were created (Figure 4-3).

Directory structure after package.skeleton call
Figure 4-3. Directory structure after package.skeleton call

We see two unexpected files in the base directory and two subdirectories. We first investigate what is in the Read-and-delete-me by clicking on the link and reading. For now, nothing we need. It says to delete the file, so we oblige by selecting the file’s checkbox and clicking the Delete toolbar button.

The DESCRIPTION file is used by R to organize its packages. Ours needs to be updated to reflect our package. Clicking the link opens the file in the code editor. Here we edit the Title: field and some others. Since our package will rely on the zoo and ggplot2 packages, we add those to the Depends field. This file is in dcf format with a keyword (the name before the colon) and value on one line. If you need more lines for the value, just give any additional lines some indented space, as was done for the “Description:” line (see Figure 4-4).

Editing the stock DESCRIPTION file template to match our package
Figure 4-4. Editing the stock DESCRIPTION file template to match our package

The R directory is where all the programming is done. In this directory we have the files containing our function definitions. We change our working directory (Ctrl+Shift+K), and the file browser updates to show this directory.

We see that the call to package.skeleton created a file named readNMRData.R, containing the definition of the one function we gave it. We could have one file per function, but that will quickly get unwieldy. We could also put all our functions into one file—but again that gets bulky. A better strategy is to group similar functions together into a file. For now, we will create a file to hold our data-processing functions (process.R), and another file for our still-to-be-written visualization functions (visualize.R).

To rename our file through the Files browser, we select its checkbox and then click the Rename toolbar button. A dialog prompts us for the new name. We open this file for editing by clicking on its link. We then open our original script file (one-big-script-file.R, which isn’t in our new project) by using the Open File toolbar button on the application’s toolbar. We then proceed to use the magic wand to create functions createZooObjects and createStateMatrix. These are then copy-and-pasted into the appropriate file in the R directory.

RStudio has some facilities for navigating through a file full of functions. The “Go to file/function” search box in the main toolbar, allows one to quickly and conveniently navigate to all the functions in a project. For in-file navigation, in the lower-left corner of the code-editor component sits a label (Figure 4-5) that contains the line and column number, and next to that, a combobox that can be popped up to select a function to jump to.

The function pop up allows you to quickly navigate to a function in a file containing many functions
Figure 4-5. The function pop up allows you to quickly navigate to a function in a file containing many functions

We next open a new R Script (Shift+Ctrl+N or through the File menu) for holding any functions for visualization and add a function to use ggplot2 to make a graphic. We save the file and update our Files menu through its Refresh button.

Documenting Functions With roxygen2

The package.skeleton command makes the man subdirectory. In R, all exported functions must be documented in some file. Such files are written using R’s Rd markup language. Looking at the man directory, we see that two files were made: readNMRData.Rd (a stub for our function), and NMRpackage-package.Rd (a stub for documenting the entire package). We open up the latter and make the required changes—at a minimum, the lines that have paired double tildes are edited to match our idea of the package.

We could go on to edit the readNMRData.Rd template, but instead we will use the roxygen2 package to document our package’s functions. Although R is organized around a workflow where one writes the function then documents it separately (presumably after the function is done), many other programming languages have facilities for writing in a literate programming style using inline documentation. Some R programmers are used to this functionality (it simplifies iterative function writing and documenting) and the roxygen2 package makes this feasible. For the modest demands of this package, it is an excellent choice.

Rd format has a number of required sections, and using roxygen2 does not eliminate the need for following that structure. All directives appear in comments (we use ##'). Keywords are prefaced with an at symbol (@). The main sections that are usually defined are a title (taken from the first line), an optional description (taken from the first paragraph), the function’s arguments (defined through the @param tags), a description of the return value (@return), whether the function will be exported (@export), and, optionally some examples. R has tools to check that the right format is followed. In particular, it will catch if you have failed to document all the arguments or if you misname a section tag.

The Rd markup is fairly straightforward and is described in the Writing R Extensions manual. An example of a documented function is shown in Figure 4-6.

Illustration of using roxygen2 to document a function
Figure 4-6. Illustration of using roxygen2 to document a function

We can also create a NEWS file to keep track of changes between versions of the package. This may not be useful for this package, but if the package proves popular with our colleagues, a NEWS file will help them see what has happened with our package (Figure 4-7). The NEWS file is a plain-text file with a simple structure. We open it through the File > New menu, but this time select Text File. The code editor will present a different toolbar in this case, as it makes no sense to be able to source R code from this file.

Editing a text file in the Source code editor shows that the toolbar is file-type dependent
Figure 4-7. Editing a text file in the Source code editor shows that the toolbar is file-type dependent

The devtools Package

Testing a package can involve loading the package, testing it, making desired changes, then reloading the package. This workflow can get tedious—it can even involve closing and restarting R to flush residual changes. The devtools package is designed to make this task easier.

If it isn’t installed, we can install it from CRAN using the Packages component (Figure 4-8). Click the Install Packages toolbar button and then type the desired package name into the dialog. (An auto-complete feature makes this easy.) Leaving the Install dependencies option checked will also install roxygen2 and the testthat package, if needed.

Rtools

Windows users will want to install RTools, a set of development tools for Windows maintained by D. Murdoch. The files are found at http://cran.r-project.org/bin/windows/Rtools .

The Install Packages dialog for installing the devtools package
Figure 4-8. The Install Packages dialog for installing the devtools package

The devtools package provides the load_all function to reload the package without having to restart R. To use it we define a package variable (pkg) pointing to the directory (an “.Rpackages” file can be used to avoid this step), then load the code (Figure 4-9). The new functions do not appear in the Workspace browser, as they are stored in an environment just below the global workspace, but they do show up through RStudio’s code-completion functionality.

The commands to use devtools for package development
Figure 4-9. The commands to use devtools for package development

We can try it out. In doing so, if we realize that we would like to change some function, no problem. We make the adjustment in the code editor, save the file, then reissue the command load_all(pkg).

For working with documentation, the devtools package has the document function (as in document(pkg)) to call roxygen2 to create the corresponding Rd files and show_news to view the NEWS file.

Package Data

We can add our testing commands in an example, but we will need to have some data to use when we distribute our package. We wrote readNMRData to accept any data file in the same format, as we imagine our colleagues using it with other data sets generated by the experiment. However, we can combine the data we have into the package for testing and example purposes. R has the data directory for including data in a package. This data should be in a format R can easily read in—ours isn’t (it has a different separator and we need to skip every other line). So instead, we use the inst directory to create our own data directory. We call this sampledata (not data, as this would interfere with the data directory that R processes automatically). We create the needed directories with the New Folder toolbar button in the Files browser.

How you get the package data file into this folder depends on how you are using RStudio. If you are using the desktop version, you simply copy the file over using your usual method (e.g., Finder, command line, Windows Explorer). If you are using the server version, then this won’t work. In that case, the Files component has an additional Upload toolbar button to allow you to upload your file. This button summons a dialog that allows you to browse for a file or a zip archive of many files (Figure 4-10).

Dialog for uploading a file to the server (server usage only)
Figure 4-10. Dialog for uploading a file to the server (server usage only)

Package Examples

R documentation files have the option of an “examples” section, where one would usually see documentation of example usage of the function(s). This is a very good idea, as it gives the user a sample template to build on. In Figure 4-11, we see sample code added to our readNMRData function’s documentation.

Adding an example to a function’s documentation with roxygen2
Figure 4-11. Adding an example to a function’s documentation with roxygen2

For an installed package, examples can be run by the user through the example function. During development with devtools, the programmer can use the run_examples function.

Adding Tests

Although examples will typically be run during package development, it is a good practice to include tests of the package’s core functions as well. Tests serve a different purpose than examples. Well-designed tests can help find bugs introduced by changes to functions—a not uncommon event. The devtools package can run tests (through testthat) that appear in the inst/tests subdirectory of the package.

Building and Installing the Package

Packages can be checked for issues, built for distribution and installed for subsequent usage. RStudio does not have any features for performing such, but all can be done within devtools, or from a shell outside of the R process. For example, a UNIX or Mac OS X user could run:

> system("cd ~; R CMD build NMRpackage")

We could replace build with CHECK to check our package for consistency with R’s expectations. Though checking isn’t required for sharing a package with colleagues, a package distributed on CRAN should pass the check phase cleanly. Checking is a good thing in any case.

Installing locally built packages can be done from the Install Packages dialog by selecting the option to install from a Package Archive File (.tgz).

The devtools package provides the functions check, build, and install for performing these three basic tasks.

For Windows users, the WinBuilder project (http://win-builder.R-project.org) is a web service that can be used to create packages. Otherwise, building R packages under Windows is made easier using the Rtools bundle mentioned earlier.

Get Getting Started with RStudio now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.