Chapter 6. Data Transformations

Introduction

This chapter is all about the apply functions: apply, lapply, sapply, tapply, mapply; and their cousins, by and split. These functions let you take data in great gulps and process the whole gulp at once. Where traditional programming languages use loops, R uses vectorized operations and the apply functions to crunch data in batches, greatly streamlining the calculations.

Defining Groups Via a Factor

An important idiom of R is using a factor to define a group. Suppose we have a vector and a factor, both of the same length, that were created as follows:

> v <- c(40,2,83,28,58)
> f <- factor(c("A","C","C","B","C"))

We can visualize the vector elements and factors levels side by side, like this:

VectorFactor
40A
2C
83A
28B
58C

The factor level identifies the group of each vector element: 40 and 83 are in group A; 28 is in group B; and 2 and 58 are in group C.

In this book, I refer to such factors as grouping factors. They effectively slice and dice our data by putting them into groups. This is powerful because processing data in groups occurs often in statistics when comparing group means, comparing group proportions, performing ANOVA analysis, and so forth.

This chapter has recipes that use grouping factors to split vector elements into their respective groups (Recipe 6.1), apply a function to groups within a vector (Recipe 6.5), and apply a function to groups of rows within a data frame (Recipe 6.6). In other chapters, the same idiom is used to test ...

Get R Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.