9
Fixing Messy Data When Aggregating
Earlier chapters of this book introduced techniques to generate summary statistics on a whole DataFrame. We used methods such as describe
, mean
, and quantile
to do that. This chapter covers more complicated aggregation tasks: aggregating by categorical variables and using aggregation to change the structure of DataFrames.
After the initial stages of data cleaning, analysts spend a substantial amount of their time doing what Hadley Wickham has called splitting-applying-combining—that is, we subset data by groups, apply some operation to those subsets, and then draw conclusions about a dataset as a whole. In slightly more specific terms, this involves generating descriptive statistics by key categorical variables. ...
Get Python Data Cleaning Cookbook - Second Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.