Handling missing values

Checking for missing values and handling them properly is an important step in the data preparation process, if they are left untreated they can:

  • Lead to the behavior between the variables not being analyzed correctly
  • Lead to incorrect interpretation and inference from the data

To see how; move up a few pages to see how the describe method is explained. Look at the output table; why are the counts for many of the variables different from each other? There are 1310 rows in the dataset, as we saw earlier in the section. Why is it then that the count is 1046 for age, 1309 for pclass, and 121 for body. This is because the dataset doesn't have a value for 264 (1310-1046) entries in the age column, 1 (1310-1309) entry in the pclass ...

Get Python: Advanced Predictive Analytics now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.