Chapter 1. Overview, Installation
This book introduces users to the RStudio™ Integrated Development Environment (IDE) for using and programming R, the widely used open-source statistical computing environment. RStudio is a separate open-source project that brings many powerful coding tools together into an intuitive, easy-to-learn interface. RStudio runs in all major platforms (Windows, Mac, Linux) and through a web browser (using the server installation). This book should appeal to newer R users, students who want to explore the interface to get the most out of R, and long-time R users looking for a more modern development environment.
RStudio is periodically released as a stable version, and has daily releases in between. This book describes the stable release 0.95, which introduced many new features to RStudio: projects for organizing code, powerful code navigation tools, and integration with two popular version control systems.
We will begin with a quick overview of R and IDEs before diving into RStudio.
What is R?
R is an open-source software environment for statistical computing and graphics. R compiles and runs on Windows, Mac OS X, and numerous UNIX platforms (such as Linux). For most platforms, R is distributed in binary format for ease of installation. The R software project was first started by Robert Gentleman and Ross Ihaka. The language was very much influenced by the S language, which was originally developed at Bell Laboratories by John Chambers and colleagues. Since then, with the direction and talents of R’s core development team, R has evolved into the lingua franca for statistical computations in many disciplines of academia and various industries.
R is much more than just its core language. It has a worldwide repository system, the Comprehensive R Archive Network (CRAN)—http://cran.r-project.org—for user-contributed add-on packages to supplement the base distribution. As of 2011, there were more than 3,000 such packages hosted on CRAN and numerous more on other sites. In total, R currently has functionality to address an enormous range of problems and still has room to grow.
R is designed around its core scripting language but also allows integration with compiled code written in C, C++, Fortran, Java, etc., for computationally intensive tasks or for leveraging tools provided for other languages.
What is an IDE?
R, like other programming languages, is extended (or developed) through user-written functions. An integrated development environment (IDE), such as RStudio, is designed to facilitate such work. In addition, unlike many other statistical software packages in which a graphical user interface is employed, a typical user interacts with R primarily through the command line. An IDE for R then must also include a means for issuing commands interactively. R is not unique in this respect, and IDEs for interactive scientific programming languages have matured to include features such as:
A console for issuing commands.
Source-code editor; at its core, development involves the act of programming, and this task is inevitably done with a source-code editor. Such editors have been around for some time now, and expectations for editors are now quite demanding. A typical set of expectations includes:
A rich set of keyboard shortcuts
Automatic source-code formatting, assistance with parentheses, keyword highlighting
Code folding and easy navigation through a file and among files
Interfaces for compiling or running of software
Integration with report-writing tools
Object browsers; in interactive use, a user’s workspace includes variables that have been defined. An object browser allows the user to identify quickly the type and values for each such variable.
Object editors; from an object browser, a means to inspect or edit objects is typically provided.
Integration with the underlying documentation.
Some existing IDEs for R are listed in Table 1-1.
ESS (http://ess.r-project.org) is a powerful and commonly used interface for R that integrates the venerable Emacs editor with R. There are numerous conveniences, but some find that it is difficult to learn and has an old-school feel, which precludes adoption.
An R API and extension for the Komodo code editor.
Java-based editor that interfaces with R through the
An extension for the Tinn editor that allows integration with an underlying R process.
The Windows GUI for R (the default interface) has some of the features of an IDE.
Mac OS X
Like the Windows GUI, provides the basic features of an IDE.
In its short existence, the RStudio project already provides nearly all the desired features for an IDE in a novel way, making it easier and more productive to use R. Further, new features are being added all the time. Some current highlights are:
The main components of an IDE are all nicely integrated into a four-pane layout that includes a console for interactive R sessions, a tabbed source-code editor to organize a project’s files, and tabbed panes within notebooks to organize less central components.
The source-code editor is easy to use, feature-rich, has excellent code-navigation features, and is well-integrated into the built-in console.
The console and source-code editor are tightly linked to R’s internal help system through tab completion and the help page viewer component.
The project feature make it easy to organize different workflows. Setting up different projects is a snap, and switching between them is even easier.
RStudio provides many convenient and easy-to-use administrative tools for managing packages, the workspace, files, and more.
The IDE is available for the three main operating systems and can be run through a web browser for remote access.
RStudio is much easier to learn than
Emacs/ESS, easier to configure and install than
Eclipse/StatET, has a much better editor than
JGR, is better organized than
Sciviews, and unlike
RGui, is available on more platforms than just Windows.
The RStudio program can be run on the desktop or through a web browser. The desktop version is available for Windows, Mac OS X, and Linux platforms and behaves similarly across all platforms, with minor differences for keyboard shortcuts.
To support so many platforms, RStudio leverages numerous existing web technologies in its design. For the desktop applications, it cleverly displays them within an industry standard HTML widget provided by Qt (a cross-platform application and UI framework) to create a desktop application. Consequently, R users can have a feature-rich and consistent programming environment for R their way—desktop- or web-based. Web-based usage is done through a trusted server within a department or organization (though a “cloud” service may be forthcoming).
RStudio is the brainchild of J. J. Allaire, who, with his brother, previously had tremendous success developing the influential ColdFusion IDE and scripting language for web development. Allaire is currently joined by the very able Joseph Cheng, Joshua Paulson, and Paul DiCristina. In the short time that their initial beta has been available, they have proven to be very responsive to user input. RStudio is under active development. As such, elements discussed in this book may be changed by the time you are reading it. Sorry…but you’ll likely be better off with the new feature than my description of the old one.
Like R, RStudio is an open-source project. Its stated goal—which it
is already meeting—is “...to develop a powerful tool that supports the
practices and techniques required for creating trustworthy, high-quality
analysis.” The codebase is released under the AGPLv3 license and is
available from GitHub (https://github.com/rstudio/rstudio).
RStudio is built on top of many other open-source projects. Most visible
of these are GWT, Google’s Web Toolkit;
Qt, the graphical toolkit of Nokia; and
projects are listed in RStudio’s
dialog. The bulk of the code is written in C++ and Java, the language for
working with GWT.
We will reverse things slightly by beginning with the process of starting RStudio, and postpone any installation issues for a bit. As RStudio can be used from the desktop or through a server, there are two ways of starting it.
For the desktop version, RStudio is started like most other
applications. In Figure 1-1, we see the
application running under a version of Windows. There it was started by
clicking on the menu item left after installation. For Mac OS X users,
one clicks on the RStudio icon in the Applications list. For Linux
users, the command
rstudio will open
the window. It may also be installed with a menu item, as is done with
Ubuntu, where it appears under
In Figure 1-1 we see three main panes: the
Console, which should look familiar
to any R user; a tabbed
pane (with no items, as the initial workspace is empty) and the
History interface. The latter two are part of
notebooks that can contain multiple panes. The
Source pane, or code editor, is not open in
the screenshot, as no files are open for editing or viewing.
Starting the server version requires one to know the appropriate URL for the resource. We used a local URL for this book, but the real value comes from using RStudio as a resource on the wider internet. When accessing RStudio, one must first authenticate. The basic screen to do so looks like Figure 1-2. Authentication depends on the server, but the default is to authenticate against the user accounts on the machine, so the web adminstrator should have provided a secure means to access RStudio.
When using the server version, only one instance per user may be opened. If a new session is started—on a different machine, or even if just in a different tab of the same browser—the old one is disconnected and a notification issued.
When R is started, it follows this process:
R is started in the working directory.
If present, the .Rprofile file’s commands are executed.
If present, the .RData file is loaded.
Other actions described in
When R quits, a user is queried to “Save workspace image?” When
the workspace is saved it writes the contents to an
.RData file, so that when R is restarted the
workspace can persist between sessions. (One can also initiate this with
This process allows R users to place commands they desire to run in every session in an .Rprofile file, and to have per directory .RData files, so that different global workspaces can be used for different projects.
RStudio provides a very useful “project” feature that allows a
user to switch quickly between projects (Organizing Activities with Projects). Each project may have different working
directories, workspaces, and collection of files in the
Source component. The current project name is
listed on the far right of the main application toolbar in a combobox
that allows one to switch between open projects, open an existing
project, or create a new project.
RStudio does not require a special version of R to run, as long as it is a fairly modern one (R 2.11.1 or later). It will work with binary versions from CRAN or user-compiled versions. As such, when RStudio starts up, it must be able to locate a version of R, which could possibly reside in many different places. Usually RStudio just finds the right one, but one can bypass the search process. The online document at http://www.rstudio.org/docs/advanced/versions_of_r details how to specify which R installation to use. In short, it depends on the underlying operating system. For Windows desktop users, it can be specified in the Options dialog (see The Options Dialog), or chosen if the Ctrl key is held on startup. For Linux and Mac OS X users, one can set an environment variable, as seen here:
$ export RSTUDIO_WHICH_R=/usr/local/bin/R
Web-based users really don’t have a choice, as this is determined by who configures the server.
Layout of the Components
The RStudio interface consists of several main components sitting below a top-level toolbar and menu bar. Although this placement can be customized, the default layout utilizes four main panes in the following positions:
In the lower left is a
Consolefor interacting with an R process (see Chapter 3).
In the lower right are tabbed panes for interacting with the
Files(The File Browser),
Plots(Graphics in RStudio),
Packages(Package Maintenance), and
Helpsystem components (The Help Page Viewer). If the facilities are present, an additional tab for version control (Version Control with RStudio) is presented.
Console pane is somewhat
privileged: it is always visible, and it has a title bar. For the other
components, their tab serves as a title bar. These panes have
page-specific toolbars (perhaps more than one)—which in the case of the
Source pane are also
The user may change the default dimensions for each of the panes, as follows. There is an adjustable divider appearing in the middle of the interface between the left and right sides that allows the user to adjust the horizontal allocation of space. Furthermore, each side then has another divider to adjust the vertical space between its two panes. As well, the title bar of each pane has icons to shade a component, maximize a component vertically, or share the space.
One can easily switch between components using the mouse. As well,
View menu has subitems for this
task. For power users, the keyboard shortcuts listed in Table 1-2 are useful. (A full list of
keyboard shortcuts is available through the
Help > Keyboard Shortcuts menu
The Options Dialog
RStudio preferences are adjusted through the
Options dialog. There are five panels for this
dialog to adjust: general properties, editing properties (Figure 3-4), appearance properties, pane
layout (Figure 1-4), and version
control (requires additional support tools to be installed).
The pane layout allows the user to determine which panes go in which corners, and, for the supplemental panes (not the Console or Source editor), where those pane’s tabs appear. One modifies a placement simply by adjusting a combobox, or by checking one of the checkboxes. In Figure 1-4, the choices put the code editor on the right, the console in the upper right, and the file browser on the upper left. There are many examples of pane placement on http://rstudio.org/screenshots/.
The appearance panel of the options dialog allows one to set the default font size and modify the theme for the editing in the console or source-code editor. This book uses the default TextMate theme for its screenshots.
Installing RStudio is usually a straightforward process.
First, RStudio requires a working, relatively modern R installation. If that is not already present, then one should consult http://cran.r-project.org to learn how to install R for the given operating system. For Windows and Mac OS X, one can simply download a self-installing binary; for Linux, installation varies. For the Debian distribution (including Ubuntu), the R system can be installed using the regular package-management tools. Of course, as R is open source, one can also compile and install it using the source code.
The RStudio package is available for download from http://www.rstudio.org/download/. There is a choice between a Desktop version and a Server version. The Desktop version is appropriate for single-user use. The files come in a common format for binary installation (e.g., exe, dmg, deb, or rpm). One downloads the file and installs it as any other program.
For those searching out the latest features, follow the link on http://www.rstudio.org/download/daily to get the binaries for the most recent (but not necessarily stable) build.
Installing a server version requires more work and care. Some directions are given at http://rstudio.org/docs/.
One can also install RStudio from its source code. A link for the
source “tarball” for the current stable version appears on the appropriate
download page. For the adventurous, the latest development build files are
available from https://github.com/rstudio/rstudio.
Installation details are in the
file accompanying the source code. The same source is used to compile both
the Desktop and Server version.
As RStudio depends on some of the latest features of many moving parts, such as GWT, there can be issues with compiling from the source. The support forums (http://support.rstudio.org/) are an excellent place to find specific answers to any issues.
RStudio creates hidden files for itself to store information, including logging information. When there are issues at startup, the log can be consulted for direction as to what is going wrong.
For desktop users, the log directory is either ~/.rstudio-desktop/log for Mac and Linux users; or for Windows users, %localappdata%\RStudio-Desktop\log (Windows Vista and 7) or %USERPROFILE%\Local Settings\Application Data\RStudio-Desktop\log for XP.
In the application’s menu bar, the
> Diagnostics item can be used to find the log
Updating RStudio is also straightforward.
To see if an update is available, the
Help > Check for Updates menu item will
open a dialog with update information.
If an update is available, one can stop RStudio, install the new version, then restart. RStudio writes session information to the user’s home directory (e.g., to the file ~/.rstudio-desktop). This will persist between upgrades.