In the previous chapters, the focus was on data that were missing at random (MAR); that is, the missing mechanism is explained by the observed variables only. Under the MAR assumption, the observed-data likelihood can be written as a product of the missing mechanism part and the outcome model part. So the maximum likelihood estimation of the outcome model could be done separately from estimating the missing mechanism; this property is called “ignorability.”
However, the MAR assumption cannot be verified using the observed data alone without assuming a parametric model. It is very possible that the missing mechanism depends on the missing data itself, or on some other unobserved variables that are associated with the missing data. For example, in a cancer clinical trial, suppose the outcome variable is a biomarker reflecting the treatment effect, and is only observed at the end of the study. Subjects may decide to drop out before the endpoint because the treatment seems to be ineffective, which leads to a violation of the MAR assumption. In the case of missing not at random (MNAR), the observed-data likelihood involves the joint distribution of the outcome and the missing indicator, which cannot be separated from each other.
Similar to the MAR assumption, MNAR cannot be tested from the observed data alone nonparametrically. The decision to consider MNAR models should be mainly based on scientific knowledge about the reasons ...