Skip to Content
Learning Data Science
book

Learning Data Science

by Sam Lau, Joseph Gonzalez, Deborah Nolan
September 2023
Beginner
596 pages
15h 31m
English
O'Reilly Media, Inc.
Content preview from Learning Data Science

Chapter 16. Model Selection

So far when we fit models, we have used a few strategies to decide which features to include:

  • Assess model fit with residual plots.

  • Connect the statistical model to a physical model.

  • Keep the model simple.

  • Compare improvements in the standard deviation of the residuals and in the MSE between increasingly complex models.

For example, when we examined the one-variable model of upward mobility in Chapter 15, we found curvature in the residual plot. Adding a second variable greatly improved the fit in terms of average loss (MSE and, relatedly, multiple R 2 ), but some curvature remained in the residuals. A seven-variable model made little improvement over the two-variable model in terms of a decrease in MSE, so although the two-variable model still showed some patterns in the residuals, we opted for this simpler model.

As another example, when we model the weight of a donkey in Chapter 18, we will take guidance from a physical model. We’ll ignore the donkey’s appendages and draw on the similarity between a barrel and a donkey’s body to begin fitting a model that explains weight by its length and girth (comparable to a barrel’s height and circumference). We’ll then continue to adjust that model by adding categorical features related to the donkey’s physical condition and age, collapsing categories, and excluding other possible features to keep the model simple.

The decisions we make in building these models are based on judgment calls, and in ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Dive Into Data Science

Dive Into Data Science

Bradford Tuckfield
Introducing Data Science

Introducing Data Science

Arno Meysman, Davy Cielen, Mohamed Ali

Publisher Resources

ISBN: 9781098112998Errata PageSupplemental Content