Efficient Base R

Logical AND and OR

The logical AND (&) and OR (|) operators are vectorized functions and are typically used during multicriteria subsetting operations. The following code, for example, returns TRUE for all elements of x greater than 0.4 or less than 0.6:

x < 0.4 | x > 0.6
#> [1]  TRUE FALSE  TRUE

When R executes this comparison, it will always calculate x > 0.6 regardless of the value of x < 0.4. In contrast, the nonvectorized version, &&, only executes the second component if needed. This is efficient and leads to neater code:

# We only calculate the mean if data doesn't contain NAs
if(!anyNA(x) && mean(x) > 0) {
  # Do something
}

compared to

if(!anyNA(x)) {
  if(mean(x) > 0) {
    # do something
  }
}

However, care must be taken not to use && or || on vectors because it only evaluates the first element of the vector, giving the incorrect answer. This is illustrated here:

x < 0.4 || x > 0.6
#> [1] TRUE

Row and Column Operations

In data analysis, we often want to apply a function to each column or row of a dataset. For example, we might want to calculate the column or row sums. The apply() function makes this type of operation straightforward.

# Second argument: 1 -> rows. 2 -> columns
apply(data_set, 1, function_name)

There are optimized functions for calculating row and column sums/means (rowSums(), colSums(), rowMeans(), and colMeans()) that should be used whenever possible. The package matrixStats contains many optimized row/column functions.

Matrices

A matrix is ...

Get Efficient R optimization now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.