In this world of big data, the problem of missing data is widespread. It is the rare database that contains no missing values at all. Depending on how the analyst deals with the missing data may change the outcome of the analysis, so it is important to learn methods for handling missing data that will not bias the results.
Missing data may arise from any of several different causes. Survey data may be missing because the responder refuses to answer a particular question, or simply skips a question by accident. Experimental observations may be missed due to inclement weather or equipment failure. Data may be lost through a noisy transmission, and so on.
In Chapter 2 we learned three common methods for handling missing data:
We learned that there were problems with each of these methods, which could generate inappropriate data values that ...