The split, apply, and combine (SAC) pattern
Many data analysis problems utilize a pattern of processing data, known as split-apply-combine. In this pattern, three steps are taken to analyze data:
- A data set is split into smaller pieces
- Each of these pieces are operated upon independently
- All of the results are combined back together and presented as a single unit
The following diagram demonstrates a simple split-apply-combine process to sum groups of numbers:
This process is actually very similar to the concepts in MapReduce. In MapReduce, massive sets of data, that are too big for a single computer, are divided into pieces and dispatched to many systems ...