Skip to Content
Data Science from Scratch, 2nd Edition
book

Data Science from Scratch, 2nd Edition

by Joel Grus
May 2019
Beginner
403 pages
9h 18m
English
O'Reilly Media, Inc.
Book available
Content preview from Data Science from Scratch, 2nd Edition

Chapter 12. k-Nearest Neighbors

If you want to annoy your neighbors, tell the truth about them.

Pietro Aretino

Imagine that you’re trying to predict how I’m going to vote in the next presidential election. If you know nothing else about me (and if you have the data), one sensible approach is to look at how my neighbors are planning to vote. Living in Seattle, as I do, my neighbors are invariably planning to vote for the Democratic candidate, which suggests that “Democratic candidate” is a good guess for me as well.

Now imagine you know more about me than just geography—perhaps you know my age, my income, how many kids I have, and so on. To the extent my behavior is influenced (or characterized) by those things, looking just at my neighbors who are close to me among all those dimensions seems likely to be an even better predictor than looking at all my neighbors. This is the idea behind nearest neighbors classification.

The Model

Nearest neighbors is one of the simplest predictive models there is. It makes no mathematical assumptions, and it doesn’t require any sort of heavy machinery. The only things it requires are:

  • Some notion of distance

  • An assumption that points that are close to one another are similar

Most of the techniques we’ll see in this book look at the dataset as a whole in order to learn patterns in the data. Nearest neighbors, on the other hand, quite consciously neglects a lot of information, since the prediction for each new point depends only on the handful ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Python Data Science Handbook, 2nd Edition

Python Data Science Handbook, 2nd Edition

Jake VanderPlas
Practical Statistics for Data Scientists, 2nd Edition

Practical Statistics for Data Scientists, 2nd Edition

Peter Bruce, Andrew Bruce, Peter Gedeck

Publisher Resources

ISBN: 9781492041122Errata Page