Chapter 11. Comparing Models with Resampling
Once we create two or more models, the next step is to compare them to understand which one is best. In some cases, comparisons might be within-model, where the same model might be evaluated with different features or preprocessing methods. Alternatively, between-model comparisons, such as when we compared linear regression and random forest models in ChapterÂ 10, are the more common scenario.
In either case, the result is a collection of resampled summary statistics (e.g.,Â RMSE, accuracy, etc.) for each model. In this chapter, weâll first demonstrate how workflow sets can be used to fit multiple models. Then, weâll discuss important aspects of resampling statistics. Finally, weâll look at how to formally compare models (using either hypothesis testing or a Bayesian approach).
Creating Multiple Models with Workflow Sets
In ChapterÂ 7 we described the idea of a workflow set where different preprocessors and/or models can be combinatorially generated. In ChapterÂ 10, we used a recipe for the Ames data that included an interaction term as well as spline functions for longitude and latitude. To demonstrate more with workflow sets, letâs create three different linear models that add these preprocessing steps incrementally; we can test whether these additional terms improve the model results. Weâll create three recipes then combine them into a workflow set: