9

Fixing Messy Data When Aggregating

Earlier chapters of this book introduced techniques to generate summary statistics on a whole DataFrame. We used methods such as describe, mean, and quantile to do that. This chapter covers more complicated aggregation tasks: aggregating by categorical variables and using aggregation to change the structure of DataFrames.

After the initial stages of data cleaning, analysts spend a substantial amount of their time doing what Hadley Wickham has called splitting-applying-combining—that is, we subset data by groups, apply some operation to those subsets, and then draw conclusions about a dataset as a whole. In slightly more specific terms, this involves generating descriptive statistics by key categorical variables. ...

Get Python Data Cleaning Cookbook - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.