Chapter 3: Identifying and Fixing Missing Values
I think I speak for many data scientists when I say that rarely is there something so seemingly small and trivial that is as of much consequence as the missing value. We spend a good deal of our time worrying about missing values because they can have a dramatic, and surprising, effect on our analysis. This is most likely to happen when missing values are not random – that is, when they are correlated with a feature or target. For example, let's say we are doing a longitudinal study of earnings, but individuals with lower education are more likely to skip the earnings question each year. There is a decent chance that this will bias our parameter estimate for education.
Of course, identifying missing ...
Get Data Cleaning and Exploration with Machine Learning now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.