APPENDIX B AIC IS PRESS!
B.1 INTRODUCTION
The model selection problem in linear regression involves choosing a best model (or at least some best models) from a group of candidate models. One approach being to pick the model that most closely fits the data would appear to involve picking the model with the lowest , but there is an obvious flaw. There is an over-fitting bias in the way SSE is estimated; the criteria used to fit a model (maximum likelihood) minimizes SSE. In particular, if explanatory variables are sequentially added to a regression model, SSE will always shrink, suggesting the absurd conclusion that the most complex model always the one that should be chosen. Put another way, using the same data to both estimate the parameters and subsequently assess fit can produce ever more optimistic estimates of model fit with each additional parameter.
One approach to the model selection problem involves constructively re-stating it as that of picking the model with the best SSE once a correction is made for over-fitting bias. A measure much like SSE that reflects how well the current model might fit fresh data is desired.
B.2 PRESS
Allen (1971, 1974) presents a useful approach to correcting for over-fitting bias in SSE based on a cross-validation idea: . Computation of PRESS ...
Get Basic Data Analysis for Time Series with R now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.