Chapter 1Missing Data Concepts and Motivating Examples

1.1 Overview of the Missing Data Problem

Data are the fundamental building blocks of valid statistical inference for biomedical and social sciences research. Unfortunately, for many reasons, more often than not we will be missing some observations. Data are sometimes missing by design, such as in two-stage case-cohort designs. There are situations when missing data are not relevant to the analysis and therefore can be safely ignored. So, it is important to understand what we mean by missing data in this book. According to Little et al. (2012b, missing data are defined as values that are not available, but otherwise would be meaningful for analysis if they were observed. Even in the case of missing data, the goal remains to make inferences about the population targeted by the complete sample. Unfortunately, there is no universal method for handling a missing data problem. This is because the selection of subjects for a study is usually known, but the process by which observations on those subjects become missing—the missingness mechanism—is usually unknown, and the data alone cannot definitively inform us about this process. Therefore, with missing data, additional assumptions are required in order to proceed with analysis, and the validity of these assumptions cannot be determined from the observed data alone. For this reason, assessing the sensitivity of conclusions to the assumptions should play a central role in any ...

Get Applied Missing Data Analysis in the Health Sciences now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.