8

Missing data and imputation methods

Alessandra Mattei, Fabrizia Mealli and Donald B. Rubin

Missing data are a pervasive problem in many data sets and seem especially widespread in social and economic studies, such as customer satisfaction surveys. Imputation is an intuitive and flexible way to handle the incomplete data sets that result. We discuss imputation, multiple imputation (MI), and other strategies to handle missing data, together with their theoretical background. Our focus is on MI, which is a statistically valid strategy for handling missing data, although we also review other valid approaches, such as direct maximum likelihood and Bayesian methods for estimating parameters, as well as less sound methods. The creation of multiply-imputed data sets is more challenging than their analysis, but still relatively straightforward relative to other valid methods, and we discuss available software for MI. Some examples and advice on computation are provided using the ABC 2010 annual customer satisfaction survey. Ad hoc methods, including using singly-imputed data sets, almost always lead to invalid inferences and should be eschewed.

8.1 Introduction

Missing values are a common problem in many data sets and seem especially widespread in social and economic studies, including customer satisfaction surveys, where customers may fail to express their satisfaction level concerning their experience with a specific business because of lack of interest, unwillingness to criticize ...

Get Modern Analysis of Customer Surveys: with applications using R now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.