Optimizing Your R Code
Once you figure out where your program is spending its time, you can focus on improving those areas. This section describes some common causes for poor performance and shows how to resolve them.
Using Vector Operations
R is a functional language with built-in support for vector operations. Whenever possible, you should use vector operations in your code and not write iterative algorithms. This section explains why.
Iterative algorithms and vector operations
Let’s consider a simple problem: calculating a vector
with the square of every integer between 1 and n. Consider the following naive
implementation:
> naive.vector.of.squares <- function(n) { + v <- 1:n + for (i in 1:n) + v[i] <- v[i]^2 + v + } > naive.vector.of.squares(10) [1] 1 4 9 16 25 36 49 64 81 100
How does the performance of this function vary with n? Let’s do a quick experiment:
> # 10,000 values > system.time(naive.vector.of.squares(10000)) user system elapsed 0.037 0.000 0.037 > # 10,000,000 values > system.time(naive.vector.of.squares(10000000)) user system elapsed 30.211 0.233 30.178
As you can see, the time required to compute the vector varies
linearly with the size of the vector (n). This makes sense: R is looping through
all n elements in the vector and changing each element one at a time.
(Note that R doesn’t actually copy the vector v repeatedly inside the
loop; see Objects Are Copied in Assignment Statements for more about how this
works.)
It turns out that there is a much better way to implement ...