Chapter 6. Data Transformations
Introduction
This chapter is all about the apply
functions: apply
, lapply
,
sapply
, tapply
,
mapply
; and their cousins, by
and
split
. These functions let you take data in great gulps
and process the whole gulp at once. Where traditional programming
languages use loops, R uses vectorized operations and the apply functions
to crunch data in batches, greatly streamlining the calculations.
Defining Groups Via a Factor
An important idiom of R is using a factor to define a group. Suppose we have a vector and a factor, both of the same length, that were created as follows:
>v <- c(40,2,83,28,58)
>f <- factor(c("A","C","C","B","C"))
We can visualize the vector elements and factors levels side by side, like this:
Vector | Factor |
---|---|
40 | A |
2 | C |
83 | A |
28 | B |
58 | C |
The factor level identifies the group of each vector element: 40 and 83 are in group A; 28 is in group B; and 2 and 58 are in group C.
In this book, I refer to such factors as grouping factors. They effectively slice and dice our data by putting them into groups. This is powerful because processing data in groups occurs often in statistics when comparing group means, comparing group proportions, performing ANOVA analysis, and so forth.
This chapter has recipes that use grouping factors to split vector elements into their respective groups (Recipe 6.1), apply a function to groups within a vector (Recipe 6.5), and apply a function to groups of rows within a data frame (Recipe 6.6). In other chapters, the same idiom is used to test ...
Get R Cookbook now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.