Data aggregation, extraction, and consolidation is often not perfect and sometimes results in missing values. There are several common strategies to deal with missing values in datasets:
- Removing all the rows with missing values from the dataset. This is simple to apply, but you may end up throwing away a big chunk of information that would have been valuable to your model.
- Using models that are, by nature, not impacted by missing values such as decision tree-based models: random forests, boosted trees. Unfortunately, the linear regression model, and by extension the SGD algorithm, does not work with missing values (http://facweb.cs.depaul.edu/sjost/csc423/documents/missing_values.pdf).
- Imputing the missing data with replacement ...