Handling missing values
Checking for missing values and handling them properly is an important step in the data preparation process, if they are left untreated they can:
- Lead to the behavior between the variables not being analyzed correctly
- Lead to incorrect interpretation and inference from the data
To see how; move up a few pages to see how the describe
method is explained. Look at the output table; why are the counts for many of the variables different from each other? There are 1310 rows in the dataset, as we saw earlier in the section. Why is it then that the count is 1046 for age
, 1309 for pclass
, and 121 for body
. This is because the dataset doesn't have a value for 264 (1310-1046) entries in the age
column, 1 (1310-1309) entry in the pclass ...
Get Learning Predictive Analytics with Python now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.