Skip to Content
Learning Data Science
book

Learning Data Science

by Sam Lau, Joseph Gonzalez, Deborah Nolan
September 2023
Beginner
596 pages
15h 31m
English
O'Reilly Media, Inc.
Content preview from Learning Data Science

Chapter 19. Classification

This chapter continues our foray into the fourth stage of the data science lifecycle: fitting and evaluating models to understand the world. So far, we’ve described how to fit a constant model using absolute error (Chapter 4) and simple and multiple linear models using squared error (Chapter 15). We’ve also fit linear models with an asymmetric loss function (Chapter 18) and with regularized loss (Chapter 16). In all of these cases, we aimed to predict or explain the behavior of a numeric outcome—bus wait times, smoke particles in the air, and donkey weights are all numeric variables.

In this chapter we expand our view of modeling. Instead of predicting numeric outcomes, we build models to predict nominal outcomes. These sorts of models enable banks to predict whether a credit card transaction is fraudulent or not, doctors to classify tumors as benign or malignant, and your email service to identify spam and set it aside from your usual emails. This type of modeling is called classification and occurs widely in data science.

Just as with linear regression, we formulate a model, choose a loss function, fit the model by minimizing average loss for our data, and assess the fitted model. But unlike linear regression, our model is not linear, the loss function is not squared error, and our assessment compares different kinds of classification errors. Despite these differences, the overall structure of model fitting carries over to this setting. Together, regression ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Dive Into Data Science

Dive Into Data Science

Bradford Tuckfield
Introducing Data Science

Introducing Data Science

Arno Meysman, Davy Cielen, Mohamed Ali

Publisher Resources

ISBN: 9781098112998Errata PageSupplemental Content