Aggregating the data to calculate summary statistics 

To aggregate values over some grouping, pandas has the groupby operationone of the library's killer features. This function creates a GroupBy object, which can behave as an iterable of (name, group) tuples, or similar to a dataframe, you can select one or many columns the same way you'd do for a dataframe.

Most importantly, those objects have two special methods:

  • agg, which will perform the given aggregation function (say, calculate averages) for each group, and return them as a dataframe with one row per each group.
  •  transform does all of the same—except that it will return the corresponding group's aggregate values for each row in the original dataframe.

The great part of both of ...

Get Learn Python by Building Data Science Applications now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.