O'Reilly logo

Effective Amazon Machine Learning by Alexis Perrier

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Missing values

Data aggregation, extraction, and consolidation is often not perfect and sometimes results in missing values. There are several common strategies to deal with missing values in datasets:

  • Removing all the rows with missing values from the dataset. This is simple to apply, but you may end up throwing away a big chunk of information that would have been valuable to your model.
  • Using models that are, by nature, not impacted by missing values such as decision tree-based models: random forests, boosted trees. Unfortunately, the linear regression model, and by extension the SGD algorithm, does not work with missing values (http://facweb.cs.depaul.edu/sjost/csc423/documents/missing_values.pdf).
  • Imputing the missing data with replacement ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required