9
Outliers and robustness for ordinal data
This chapter tackles the topics of robustness and multivariate outlier detection for ordinal data. We initially review outlier detection methods in regression for continuous data and give an example which shows that graphical tools of data analysis or traditional diagnostic measures based on all the observations are not sufficient to detect multivariate atypical observations. Then we focus on ordinal data and illustrate how to detect atypical measurements in customer satisfaction surveys. Next, we review the generalized linear model of ordinal regression and apply it to the ABC survey. The chapter concludes with an analysis of a set of diagnostics to check the goodness of the suggested model and the presence of anomalous observations.
9.1 An overview of outlier detection methods
There are several definitions of outliers in the statistical literature (see Barnett and Lewis, 1994; Atkinson et al., 2004; Hadi et al., 2009). A commonly used definition is that outliers are a minority of observations in a data set that is represented by a common pattern which can be captured by some statistical model. The assumption here is that there is a core of at least 50% of observations that is homogeneous and a set of remaining observations (hopefully few) which has patterns that are inconsistent with this common pattern. Awareness of outliers in some form or another has existed for at least 2000 years. Thucydides, ...