The split, apply, and combine (SAC) pattern

Many data analysis problems utilize a pattern of processing data, known as split-apply-combine. In this pattern, three steps are taken to analyze data:

  1. A data set is split into smaller pieces
  2. Each of these pieces are operated upon independently
  3. All of the results are combined back together and presented as a single unit

The following diagram demonstrates a simple split-apply-combine process to sum groups of numbers:

This process is actually very similar to the concepts in MapReduce. In MapReduce, massive sets of data, that are too big for a single computer, are divided into pieces and dispatched to many systems ...

