Handling missing data
In data science, missing values occur when no data value is stored for a field in a record—in other words, when we don't have a value for a column in a row. It is a common scenario, but nonetheless, it can have a significant negative effect on the usefulness of the data, so it needs to be explicitly handled.
The approach in DataFrames is to mark the missing value by using the Missing type. The default behavior is the propagation of the missing values, thus poisoning the data operations that involve missing—that is, operations involving valid input, and missing will return missing or fail. Hence, in most cases, the missing values need to be addressed in the data-cleaning phase.
The most common techniques for handling ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access