R Language Fundamentals
In this chapter we introduce the basic language data types and discuss
their capabilities and structures. Then topics such as ﬂow-control, iteration,
subsetting and exception handling will be presented. R directly supports two
diﬀerent object-oriented programming (OOP) paradigms, which are discussed
in detail in Chapter 3. Many operations in R are vectorized, and understand-
ing and using vectorization is an essential component of becoming a proﬁcient
The R language was primarily designed as a language for data manipula-
tion, modeling and visualization, and many of the data structures reﬂect this
view. However, R is itself a full-ﬂedged programming language, with its own
idioms – much like any other programming language. In some ways R can b e
considered as a functional programming language, although it is not purely
functional. R supports a form of lexical scope that provides a useful paradigm
for encapsulating computations.
R is an implementation of the S language (Becker et al., 1988; Chambers and
Hastie, 1992; Chambers, 1998). There is another commercial implementation
available from Insightful Corporation, called S-PLUS. The two implementa-
tions are quite similar, and much of the material covered here can be used in
either. However, there are many R-speciﬁc extensions that are used in this
monograph and users of R are our intended audience.
2.1.1 A brief introduction to R
We presume a reasonable familiarity with R but there are a few points that
will help to clarify some of the discussion. When R is started, a workspace is
created and that workspace is where the user creates and manipulates vari-
ables. This workspace is an environment, and an environment is a set of
bindings of names, or symbols, to values. The top-level workspace can be
accessed through its name, which is .GlobalEnv.
Assignment of value to a variable is generally done with either the = (equals)
character, or a special symb ol that is the concatenation of less than and mi-
nus, <-. Assignment creates a binding b etween a symbol and a value, in a
6 R Programming for Bioinformatics
particular environment. Removal of bindings is done with the function rm. In
the next code chunk, we create a symbol x and assign to it the value 10. We
then create a second symbol and assign the same value as x has.
> x = 10
> y = x
The value associated with y is a copy of the value asso ciated with x, and
changes to x do not aﬀect y.
The semantics of rm(x) are that the association between x and its value
is broken and the symbol x is removed from the environment, but nothing is
done to the value that x referred to. If this value can b e acce ssed in other
ways, it will remain available. We provide an example in Section 188.8.131.52.
Valid variable names, sometimes referred to as syntactic names, are any
sequence of letters, digits, the period and the underscore, but they cannot
b egin with a digit or the underscore. If they begin with a period, the second
character cannot be a digit. Variable names that violate these rules must be
quoted (see the Quotes manual page) and the preferred quote is the backtick.
> _foo = 10
> "10:10" = 20
 "10:10" "Rvers" "_foo" "basename"
 "biocUrls" "repos" "x" "y"
Attributes can be attached to any R object except NULL and they are used
quite extensively. Attributes are stored, by name, in a list. All attributes can
b e retrieved using attributes, or any particular attribute can be accessed or
modiﬁed using the attr function. Attributes can be used by programmers to
attach any sort of information they want to any R object. R uses attributes
for many things; the S3 class system is based largely on attributes, dimensions
of arrays, and names on vectors, to name but a few.
In the code below, we attach an attribute to x and then show how the
printing of x changes to reﬂect the fact that it has an attribute.
> x = 1:10
> attr(x, "foo") = 11