Appendix E

Analyzing the Regression Equation

It is a test of true theories not only to account for but to predict phenomena.

—William Whewell

Outliers

An outlier is an observation with a large residual—that is, a large deviation between the observed value and fitted value. Outliers reflect one of the following conditions:

  1. An error in collecting or manipulating the data for the given point.
  2. The existence of a significant extraneous causal factor that only affected the outlier(s).
  3. The omission of an important explanatory variable from the equation.
  4. A structural flaw in the model.

The presence of outliers indicates a deficiency in the model. After verifying that an outlier is not the result of error, one should try to identify possible factors responsible for the aberrant behavior. If the outlier can be explained by a missing variable that affected all observations, then this variable should be included in the equation. If, however, the outlier was a consequence of an isolated event that is not expected to reoccur, then it should be viewed as an unrepresentative point, and the regression should be rerun with the outlier deleted. This recalculation is important, since the method of least squares used to derive the regression coefficients will give greater weight to outliers. Thus, one or two such points could seriously distort the regression equation fit. However, unless the isolated causes of the outlier have been identified, one should avoid the temptation of deleting ...

Get A Complete Guide to the Futures Market, 2nd Edition now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.