11 MODEL SELECTION AND CROSS-VALIDATION

11.1 BACKGROUND

A number of situations will be presented in which several possible models are proposed to fit a set of data. In these situations, both hypothesis testing and information criteria can be used for model selection. Both hypothesis testing and information criteria are rooted in the same basic idea: a more complex model will generally fit the data better than a simple model, so that the complexity of models and how well they fit the data must be balanced in making a decision about which model best fits the data. Two other strands enter into the discussion: models may not be nested, and picking the best model requires clarification. By best model, does one mean the model most likely to have generated the data, or the model most likely to make good future predictions?

For linear models with normal errors, fit can be quantified as , which is just the sum of the squared residuals. For all models, complexity can be characterized by the dimension of the parameter space p. Although this sounds quite intimidating, for frugally parameterized models (always the case in this book), it is just the number of parameters estimated using OLS. Hence, every model selection problem contains a table with the values of SSE and p for each model. Different approaches to model selection just use these values in different ways. Once the principles of ...

Get Basic Data Analysis for Time Series with R now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.