Chapter 10Models For Difficult Data

We have introduced three major families of models so far: the linear regression model for normal data, the generalized linear model for certain kinds of non-normal data, and the hierarchical model for data that has a hierarchical structure. These three families of models cover many situations that we encounter in business. Still we regularly come across data that do not conform to the assumptions that underlie the models we have discussed so far. This chapter investigates how to relax some of the assumptions of regression models. We examine extensions to regression models that handle common challenges such as the existence of outliers, the presence of heteroscedasticity, and the occurrence of certain kinds of missing data. We will focus primarily on the linear regression model but the extensions we discuss can often be more broadly applied.

10.1 Living with Outliers—Robust Regression Models

Outliers appear with some regularity in business data. Some firms or individuals may be truly exceptional while others may dramatically underperform. Statistical rules of thumb often recommend eliminating these cases according to some guideline, such as whether or not an observation departs from the mean by a certain number of standard deviations. Yet, these rules of thumb can be associated with their own problems. Venables and Ripley (2002, p. 120) list several reasons why routine deletion of outliers can be problematic. First, a binary yes/no decision ...

Get Bayesian Methods for Management and Business: Pragmatic Solutions for Real Problems now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.