Many data analysis problems utilize a pattern of processing data referred to as split-apply-combine. In this pattern, three steps are taken to analyze data:
- A dataset is split into smaller pieces based on certain criteria
- Each of these pieces are operated upon independently
- All the results are then combined back and presented as a single unit
The following diagram demonstrates a simple split-apply-combine process to calculate the mean of values grouped by a character-based key (a or b):
The data is then split by the index label into two groups (one each for a and b). The mean of the values in each ...