Unit 33Handling Missing Data

Data is almost never perfect. Some values are firm (no need to worry about them); some are questionable (you’ve got to treat them with a grain of salt); and some are simply missing.

pandas traditionally uses numpy.nan (explained here) to represent missing data—probably so as to not confuse it with any number and because its name resembles the NA (“Not Available”) symbol from the R language. pandas also provides functions for recognizing and imputing missing values.

There are several reasons values may be missing in series and frames: you may have never collected them; you may have collected them but discarded them as inappropriate; or you may have combined several complete data sets, but their combination was ...

Get Data Science Essentials in Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.