Chapter 10. Speed

As a data scientist, you need speed. You can work with bigger data and do more ambitious tasks when your code runs fast. This chapter will show you a specific way to write fast code in R. You will then use the method to simulate 10 million plays of your slot machine.

Vectorized Code

You can write a piece of code in many different ways, but the fastest R code will usually take advantage of three things: logical tests, subsetting, and element-wise execution. These are the things that R does best. Code that uses these things usually has a certain quality: it is vectorized; the code can take a vector of values as input and manipulate each value in the vector at the same time.

To see what vectorized code looks like, compare these two examples of an absolute value function. Each takes a vector of numbers and transforms it into a vector of absolute values (e.g., positive numbers). The first example is not vectorized; abs_loop uses a for loop to manipulate each element of the vector one at a time:

abs_loop <- function(vec){
  for (i in 1:length(vec)) {
    if (vec[i] < 0) {
      vec[i] <- -vec[i]
    }
  }
  vec
}

The second example, abs_set, is a vectorized version of abs_loop. It uses logical subsetting to manipulate every negative number in the vector at the same time:

abs_sets <- function(vec){
  negs <- vec < 0
  vec[negs] <- vec[negs] * -1
  vec
}

abs_set is much faster than abs_loop because it relies on operations that R does quickly: logical tests, subsetting, and element-wise execution.

You ...

Get Hands-On Programming with R now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.