Chapter 6k-Nearest Neighbors

In Chapter 5, we introduced logistic regression as one of several methods for assigning a label or class to new data (classification). In this chapter, we introduce another classification approach that assigns a class to an unlabeled data point based upon the most common class of existing similar data points. This method is known as k-nearest neighbors.

The nearest neighbors algorithm is part of a family of algorithms that are known as lazy learners. These types of learners do not build a model, which means they do not really do any learning. Instead, they simply refer to the training data during the prediction phase in order to assign labels to unlabeled data. Lazy learners are also referred to as instance-based learners or rote learners due to their heavy reliance on the training set. Despite the simplicity of lazy learners, such as the k-nearest neighbors approach, they are powerful in dealing with difficult-to-understand data that have a large number of features with a large number of instances of fairly similar class.

By the end of this chapter, you will have learned the following:

How to quantify the similarity between new and existing data
How to choose the appropriate number of “neighbors” (k) to use in classifying new data
How the k-NN classification process works
How to use the k-NN classifier to assign labels to new data in R
The strengths and weaknesses of the k-NN method

DETECTING HEART DISEASE

As we explore the nearest neighbors ...

Get Practical Machine Learning in R now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Practical Machine Learning in R by Fred Nwanganga, Mike Chapple

Chapter 6k-Nearest Neighbors

DETECTING HEART DISEASE

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly