Many of the prediction questions that we face are classification questions. You might want to predict a website user’s intent among several options, a speaker’s political affiliation among a few parties, or the subject of an untagged image. These examples fit in our usual regression framework, where response y is a function of inputs x. The difference is that y now represents membership in a category: y ∊ {1, 2, . . ., m}. The prediction question is, given a new x, what is our best guess at the response category, ŷ?

Nearest Neighbors

We have already worked with two-category classification via logistic regression, where y ∊ {0, ...

Get Business Data Science: Combining Machine Learning and Economics to Optimize, Automate, and Accelerate Business Decisions now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.