Handling missing values

Checking for missing values and handling them properly is an important step in the data preparation process, if they are left untreated they can:

  • Lead to the behavior between the variables not being analyzed correctly
  • Lead to incorrect interpretation and inference from the data

To see how; move up a few pages to see how the describe method is explained. Look at the output table; why are the counts for many of the variables different from each other? There are 1310 rows in the dataset, as we saw earlier in the section. Why is it then that the count is 1046 for age, 1309 for pclass, and 121 for body. This is because the dataset doesn't have a value for 264 (1310-1046) entries in the age column, 1 (1310-1309) entry in the pclass ...

Get Python: Data Analytics and Visualization now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.