July 2017
Intermediate to advanced
360 pages
8h 26m
English
Sometimes a dataset can contain missing features, so there are a few options that can be taken into account:
The first option is the most drastic one and should be considered only when the dataset is quite large, the number of missing features is high, and any prediction could be risky. The second option is much more difficult because it's necessary to determine a supervised strategy to train a model for each feature and, finally, to predict their value. Considering all pros and cons, the third option is likely to be the best choice. scikit-learn offers the class Imputer ...
Read now
Unlock full access