Efficient Base R
Logical AND and OR
The logical AND (&
) and OR (|
) operators are vectorized functions
and are typically used during multicriteria subsetting operations. The
following code, for example, returns TRUE
for all elements of x
greater
than 0.4 or less than 0.6:
x
<
0.4
|
x
>
0.6
#> [1] TRUE FALSE TRUE
When R executes this comparison, it will always calculate
x > 0.6
regardless of the value of x < 0.4
. In contrast, the
nonvectorized version, &&
, only executes the second component if
needed. This is efficient and leads to neater code:
# We only calculate the mean if data doesn't contain NAs
if
(
!
anyNA
(
x
)
&&
mean
(
x
)
>
0
)
{
# Do something
}
compared to
if
(
!
anyNA
(
x
))
{
if
(
mean
(
x
)
>
0
)
{
# do something
}
}
However, care must be taken not to use &&
or ||
on vectors because it
only evaluates the first element of the vector, giving the incorrect
answer. This is illustrated here:
x
<
0.4
||
x
>
0.6
#> [1] TRUE
Row and Column Operations
In data analysis, we often want to apply a function to each column or row
of a dataset. For example, we might want to calculate the column or
row sums. The apply()
function makes this type of operation
straightforward.
# Second argument: 1 -> rows. 2 -> columns
apply
(
data_set
,
1
,
function_name
)
There are optimized functions for calculating row and column
sums/means (rowSums()
, colSums()
, rowMeans()
, and colMeans()
) that
should be used whenever possible. The package matrixStats contains
many optimized row/column functions.
Matrices
A matrix is ...
Get Efficient R optimization now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.