June 2017
Beginner to intermediate
576 pages
15h 22m
English
Although it is always important to understand the source of your missing values, how you ultimately handle them depends upon the technique that you use to analyse your data sets. For example, classification methods such as decision trees and random forests know how to deal with missing values, since they can treat them as a separate class, and you can safely leave them in the model. However, if a variable has a large amount of missing values, say > 20%, you might want to look at imputation techniques, or try to find a better variable that measures the same thing.