The split, apply, and combine (SAC) pattern

Many data analysis problems utilize a pattern of processing data, known as split-apply-combine. In this pattern, three steps are taken to analyze data:

  1. A data set is split into smaller pieces
  2. Each of these pieces are operated upon independently
  3. All of the results are combined back together and presented as a single unit

The following diagram demonstrates a simple split-apply-combine process to sum groups of numbers:

The split, apply, and combine (SAC) pattern

This process is actually very similar to the concepts in MapReduce. In MapReduce, massive sets of data, that are too big for a single computer, are divided into pieces and dispatched to many systems ...

Get Learning pandas now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.