O'Reilly logo

Getting Started with RStudio by John Verzani

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 1. Overview, Installation

This book introduces users to the RStudio™ Integrated Development Environment (IDE) for using and programming R, the widely used open-source statistical computing environment. RStudio is a separate open-source project that brings many powerful coding tools together into an intuitive, easy-to-learn interface. RStudio runs in all major platforms (Windows, Mac, Linux) and through a web browser (using the server installation). This book should appeal to newer R users, students who want to explore the interface to get the most out of R, and long-time R users looking for a more modern development environment.

RStudio is periodically released as a stable version, and has daily releases in between. This book describes the stable release 0.95, which introduced many new features to RStudio: projects for organizing code, powerful code navigation tools, and integration with two popular version control systems.

We will begin with a quick overview of R and IDEs before diving into RStudio.

What is R?

R is an open-source software environment for statistical computing and graphics. R compiles and runs on Windows, Mac OS X, and numerous UNIX platforms (such as Linux). For most platforms, R is distributed in binary format for ease of installation. The R software project was first started by Robert Gentleman and Ross Ihaka. The language was very much influenced by the S language, which was originally developed at Bell Laboratories by John Chambers and colleagues. Since then, with the direction and talents of R’s core development team, R has evolved into the lingua franca for statistical computations in many disciplines of academia and various industries.

R is much more than just its core language. It has a worldwide repository system, the Comprehensive R Archive Network (CRAN)—http://cran.r-project.org—for user-contributed add-on packages to supplement the base distribution. As of 2011, there were more than 3,000 such packages hosted on CRAN and numerous more on other sites. In total, R currently has functionality to address an enormous range of problems and still has room to grow.

R is designed around its core scripting language but also allows integration with compiled code written in C, C++, Fortran, Java, etc., for computationally intensive tasks or for leveraging tools provided for other languages.

What is an IDE?

R, like other programming languages, is extended (or developed) through user-written functions. An integrated development environment (IDE), such as RStudio, is designed to facilitate such work. In addition, unlike many other statistical software packages in which a graphical user interface is employed, a typical user interacts with R primarily through the command line. An IDE for R then must also include a means for issuing commands interactively. R is not unique in this respect, and IDEs for interactive scientific programming languages have matured to include features such as:

  • A console for issuing commands.

  • Source-code editor; at its core, development involves the act of programming, and this task is inevitably done with a source-code editor. Such editors have been around for some time now, and expectations for editors are now quite demanding. A typical set of expectations includes:

    • A rich set of keyboard shortcuts

    • Automatic source-code formatting, assistance with parentheses, keyword highlighting

    • Code folding and easy navigation through a file and among files

    • Context-sensitive assistance

    • Interfaces for compiling or running of software

    • Project-management features

    • Debugging assistance

    • Integration with report-writing tools

  • Object browsers; in interactive use, a user’s workspace includes variables that have been defined. An object browser allows the user to identify quickly the type and values for each such variable.

  • Object editors; from an object browser, a means to inspect or edit objects is typically provided.

  • Integration with the underlying documentation.

  • Plot-management tools.

Some existing IDEs for R are listed in Table 1-1.

Table 1-1. Some existing IDEs for R
NamePlatformsDescription

ESS

All

ESS (http://ess.r-project.org) is a powerful and commonly used interface for R that integrates the venerable Emacs editor with R. There are numerous conveniences, but some find that it is difficult to learn and has an old-school feel, which precludes adoption.

Eclipse

All

The open-source StatET plugin (http://www.walware.de/goto/statet) turns Eclipse, a Java-based multipurpose IDE, into a full-featured IDE for R.

SciViews

All

An R API and extension for the Komodo code editor.

JGR

All

Java-based editor that interfaces with R through the rJava and JRI packages. The Deducer package adds a suite of data analysis tools.

Tinn-R

Windows

An extension for the Tinn editor that allows integration with an underlying R process.

Notepad++

Windows

With the NpptoR extension allows the Notepad++ editor to interact with an R process.

RGui

Windows

The Windows GUI for R (the default interface) has some of the features of an IDE.

R.app

Mac OS X

Like the Windows GUI, provides the basic features of an IDE.

Why RStudio?

In its short existence, the RStudio project already provides nearly all the desired features for an IDE in a novel way, making it easier and more productive to use R. Further, new features are being added all the time. Some current highlights are:

  • The main components of an IDE are all nicely integrated into a four-pane layout that includes a console for interactive R sessions, a tabbed source-code editor to organize a project’s files, and tabbed panes within notebooks to organize less central components.

  • The source-code editor is easy to use, feature-rich, has excellent code-navigation features, and is well-integrated into the built-in console.

  • The console and source-code editor are tightly linked to R’s internal help system through tab completion and the help page viewer component.

  • The project feature make it easy to organize different workflows. Setting up different projects is a snap, and switching between them is even easier.

  • RStudio provides many convenient and easy-to-use administrative tools for managing packages, the workspace, files, and more.

  • The IDE is available for the three main operating systems and can be run through a web browser for remote access.

  • RStudio is much easier to learn than Emacs/ESS, easier to configure and install than Eclipse/StatET, has a much better editor than JGR, is better organized than Sciviews, and unlike Notepad++ and RGui, is available on more platforms than just Windows.

The RStudio program can be run on the desktop or through a web browser. The desktop version is available for Windows, Mac OS X, and Linux platforms and behaves similarly across all platforms, with minor differences for keyboard shortcuts.

To support so many platforms, RStudio leverages numerous existing web technologies in its design. For the desktop applications, it cleverly displays them within an industry standard HTML widget provided by Qt (a cross-platform application and UI framework) to create a desktop application. Consequently, R users can have a feature-rich and consistent programming environment for R their way—desktop- or web-based. Web-based usage is done through a trusted server within a department or organization (though a “cloud” service may be forthcoming).

RStudio is the brainchild of J. J. Allaire, who, with his brother, previously had tremendous success developing the influential ColdFusion IDE and scripting language for web development. Allaire is currently joined by the very able Joseph Cheng, Joshua Paulson, and Paul DiCristina. In the short time that their initial beta has been available, they have proven to be very responsive to user input. RStudio is under active development. As such, elements discussed in this book may be changed by the time you are reading it. Sorry…but you’ll likely be better off with the new feature than my description of the old one.

Like R, RStudio is an open-source project. Its stated goal—which it is already meeting—is “...to develop a powerful tool that supports the practices and techniques required for creating trustworthy, high-quality analysis.” The codebase is released under the AGPLv3 license and is available from GitHub (https://github.com/rstudio/rstudio). RStudio is built on top of many other open-source projects. Most visible of these are GWT, Google’s Web Toolkit; Qt, the graphical toolkit of Nokia; and Ace, the JavaScript code editor (http://ace.ajax.org). Other leveraged projects are listed in RStudio’s About dialog. The bulk of the code is written in C++ and Java, the language for working with GWT.

Using RStudio

We will reverse things slightly by beginning with the process of starting RStudio, and postpone any installation issues for a bit. As RStudio can be used from the desktop or through a server, there are two ways of starting it.

Desktop Version

For the desktop version, RStudio is started like most other applications. In Figure 1-1, we see the application running under a version of Windows. There it was started by clicking on the menu item left after installation. For Mac OS X users, one clicks on the RStudio icon in the Applications list. For Linux users, the command rstudio will open the window. It may also be installed with a menu item, as is done with Ubuntu, where it appears under Programming.

RStudio on initial startup; the main interface has four panes (one hidden in this screenshot), an application toolbar, and in some cases, a menu bar
Figure 1-1. RStudio on initial startup; the main interface has four panes (one hidden in this screenshot), an application toolbar, and in some cases, a menu bar

In Figure 1-1 we see three main panes: the Console, which should look familiar to any R user; a tabbed Workspace pane (with no items, as the initial workspace is empty) and the History interface. The latter two are part of notebooks that can contain multiple panes. The Source pane, or code editor, is not open in the screenshot, as no files are open for editing or viewing.

Server Version

Starting the server version requires one to know the appropriate URL for the resource. We used a local URL for this book, but the real value comes from using RStudio as a resource on the wider internet. When accessing RStudio, one must first authenticate. The basic screen to do so looks like Figure 1-2. Authentication depends on the server, but the default is to authenticate against the user accounts on the machine, so the web adminstrator should have provided a secure means to access RStudio.

Login screen for the server version of RStudio
Figure 1-2. Login screen for the server version of RStudio

Once authenticated, the basic layout looks similar to that of the desktop version—compare the basic elements of Figure 1-1 to Figure 1-3 to see this.

Screenshot of RStudio startup run through a web browser; here, the Source component is hidden, as no files are currently being edited
Figure 1-3. Screenshot of RStudio startup run through a web browser; here, the Source component is hidden, as no files are currently being edited

Note

When using the server version, only one instance per user may be opened. If a new session is started—on a different machine, or even if just in a different tab of the same browser—the old one is disconnected and a notification issued.

Which Workspace?

When R is started, it follows this process:

  • R is started in the working directory.

  • If present, the .Rprofile file’s commands are executed.

  • If present, the .RData file is loaded.

  • Other actions described in ?Startup are followed.

When R quits, a user is queried to “Save workspace image?” When the workspace is saved it writes the contents to an .RData file, so that when R is restarted the workspace can persist between sessions. (One can also initiate this with save.image.)

This process allows R users to place commands they desire to run in every session in an .Rprofile file, and to have per directory .RData files, so that different global workspaces can be used for different projects.

Projects

RStudio provides a very useful “project” feature that allows a user to switch quickly between projects (Organizing Activities with Projects). Each project may have different working directories, workspaces, and collection of files in the Source component. The current project name is listed on the far right of the main application toolbar in a combobox that allows one to switch between open projects, open an existing project, or create a new project.

Which R?

RStudio does not require a special version of R to run, as long as it is a fairly modern one (R 2.11.1 or later). It will work with binary versions from CRAN or user-compiled versions. As such, when RStudio starts up, it must be able to locate a version of R, which could possibly reside in many different places. Usually RStudio just finds the right one, but one can bypass the search process. The online document at http://www.rstudio.org/docs/advanced/versions_of_r details how to specify which R installation to use. In short, it depends on the underlying operating system. For Windows desktop users, it can be specified in the Options dialog (see The Options Dialog), or chosen if the Ctrl key is held on startup. For Linux and Mac OS X users, one can set an environment variable, as seen here:

$ export RSTUDIO_WHICH_R=/usr/local/bin/R

Web-based users really don’t have a choice, as this is determined by who configures the server.

Layout of the Components

The RStudio interface consists of several main components sitting below a top-level toolbar and menu bar. Although this placement can be customized, the default layout utilizes four main panes in the following positions:

The Console pane is somewhat privileged: it is always visible, and it has a title bar. For the other components, their tab serves as a title bar. These panes have page-specific toolbars (perhaps more than one)—which in the case of the Source pane are also context-specific.

The user may change the default dimensions for each of the panes, as follows. There is an adjustable divider appearing in the middle of the interface between the left and right sides that allows the user to adjust the horizontal allocation of space. Furthermore, each side then has another divider to adjust the vertical space between its two panes. As well, the title bar of each pane has icons to shade a component, maximize a component vertically, or share the space.

Keyboard Shortcuts

One can easily switch between components using the mouse. As well, the View menu has subitems for this task. For power users, the keyboard shortcuts listed in Table 1-2 are useful. (A full list of keyboard shortcuts is available through the Help > Keyboard Shortcuts menu item.)

Table 1-2. Keyboard shortcuts for navigation between major components
DescriptionWindows & LinuxMac

Move cursor to Source Editor

Ctrl+1

Ctrl+1

Move cursor to Console

Ctrl+2

Ctrl+2

Show workspace

Ctrl+3

Ctrl+3

Show history

Ctrl+4

Ctrl+4

Show files

Ctrl+5

Ctrl+5

Show plots

Ctrl+6

Ctrl+6

Show packages

Ctrl+7

Ctrl+7

Show help

Ctrl+8

Ctrl+8

Show Git/SVN

Ctrl+9

Ctrl+9

Shell

Ctrl+Shift+H

Cmd+Shift+H

The Options Dialog

RStudio preferences are adjusted through the Options dialog. There are five panels for this dialog to adjust: general properties, editing properties (Figure 3-4), appearance properties, pane layout (Figure 1-4), and version control (requires additional support tools to be installed).

The pane layout allows the user to determine which panes go in which corners, and, for the supplemental panes (not the Console or Source editor), where those pane’s tabs appear. One modifies a placement simply by adjusting a combobox, or by checking one of the checkboxes. In Figure 1-4, the choices put the code editor on the right, the console in the upper right, and the file browser on the upper left. There are many examples of pane placement on http://rstudio.org/screenshots/.

The appearance panel of the options dialog allows one to set the default font size and modify the theme for the editing in the console or source-code editor. This book uses the default TextMate theme for its screenshots.

Pane preference dialog for adjusting component layout
Figure 1-4. Pane preference dialog for adjusting component layout

Installing RStudio

Installing RStudio is usually a straightforward process.

First, RStudio requires a working, relatively modern R installation. If that is not already present, then one should consult http://cran.r-project.org to learn how to install R for the given operating system. For Windows and Mac OS X, one can simply download a self-installing binary; for Linux, installation varies. For the Debian distribution (including Ubuntu), the R system can be installed using the regular package-management tools. Of course, as R is open source, one can also compile and install it using the source code.

The RStudio package is available for download from http://www.rstudio.org/download/. There is a choice between a Desktop version and a Server version. The Desktop version is appropriate for single-user use. The files come in a common format for binary installation (e.g., exe, dmg, deb, or rpm). One downloads the file and installs it as any other program.

For those searching out the latest features, follow the link on http://www.rstudio.org/download/daily to get the binaries for the most recent (but not necessarily stable) build.

Installing a server version requires more work and care. Some directions are given at http://rstudio.org/docs/.

One can also install RStudio from its source code. A link for the source “tarball” for the current stable version appears on the appropriate download page. For the adventurous, the latest development build files are available from https://github.com/rstudio/rstudio. Installation details are in the INSTALL file accompanying the source code. The same source is used to compile both the Desktop and Server version.

As RStudio depends on some of the latest features of many moving parts, such as GWT, there can be issues with compiling from the source. The support forums (http://support.rstudio.org/) are an excellent place to find specific answers to any issues.

Logging

RStudio creates hidden files for itself to store information, including logging information. When there are issues at startup, the log can be consulted for direction as to what is going wrong.

For desktop users, the log directory is either ~/.rstudio-desktop/log for Mac and Linux users; or for Windows users, %localappdata%\RStudio-Desktop\log (Windows Vista and 7) or %USERPROFILE%\Local Settings\Application Data\RStudio-Desktop\log for XP.

In the application’s menu bar, the Help > Diagnostics item can be used to find the log files.

Updating RStudio

Updating RStudio is also straightforward.

To see if an update is available, the Help > Check for Updates menu item will open a dialog with update information.

If an update is available, one can stop RStudio, install the new version, then restart. RStudio writes session information to the user’s home directory (e.g., to the file ~/.rstudio-desktop). This will persist between upgrades.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required