Chapter 11. Group Manipulation

Ageneral rule of thumb for data analysis is that manipulating the data (or “data munging,” a term coined by Simple founder Josh Reich) consumes about 80% of the effort. This often requires repeated operations on different sections of the data, something Hadley Wickham coined “split-apply-combine.” That is, we split the data into discrete sections based on some metric, apply a transformation of some kind to each section, and then combine all the sections together. This is somewhat like the MapReduce1 paradigm of Hadoop.2 There are many different ways to iterate over data in R, and we will look at some of the more convenient functions.

1. MapReduce is where data are split into discrete sets, computed on, and then ...

Get R for Everyone: Advanced Analytics and Graphics now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.