11. Group Manipulation

A general rule of thumb for data analysis is that manipulating the data (or “data munging,” a term popularized by Simple founder Josh Reich) consumes about 80 percent of the effort. This often requires repeated operations on different sections of the data, something Hadley Wickham coined “split-apply-combine.” That is, we split the data into discrete sections based on some metric, apply a transformation of some kind to each section, and then combine all the sections together. This is somewhat like the Map Reduce1 paradigm of Hadoop.2 There are many different ways to iterate over data in R, and we look at some of the more convenient functions. Much of the functionality seen in this chapter is constantly being improved, with ...

Get R for Everyone: Advanced Analytics and Graphics, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.