1Introduction

1.1 The Problem of Missing Data

Standard statistical methods have been developed to analyze rectangular data sets. Traditionally, the rows of the data matrix represent units, also called cases, observations, or subjects depending on context, and the columns represent characteristics or variables measured for each unit. The entries in the data matrix are nearly always real numbers, either representing the values of essentially continuous variables, such as age and income, or representing categories of response, which may be ordered (e.g., level of education) or unordered (e.g., race, sex). This book concerns the analysis of such a data matrix when some of the entries in the matrix are not observed. For example respondents in a household survey may refuse to report income; in an industrial experiment, some results are missing because of mechanical failures unrelated to the experimental process; in an opinion survey, some individuals may be unable to express a preference for one candidate over another.

In the first two examples, it is natural to treat the values that are not observed as missing, in the sense that there are actual underlying values that would have been observed if survey techniques had been better or the industrial equipment had been better maintained. In the third example, however, it is less clear that a well-defined candidate preference has been masked by the nonresponse; thus, it is less natural to treat the unobserved values as missing. Instead, ...

Get Statistical Analysis with Missing Data., 3rd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.