Chapter 6k-Nearest Neighbors
In Chapter 5, we introduced logistic regression as one of several methods for assigning a label or class to new data (classification). In this chapter, we introduce another classification approach that assigns a class to an unlabeled data point based upon the most common class of existing similar data points. This method is known as k-nearest neighbors.
The nearest neighbors algorithm is part of a family of algorithms that are known as lazy learners. These types of learners do not build a model, which means they do not really do any learning. Instead, they simply refer to the training data during the prediction phase in order to assign labels to unlabeled data. Lazy learners are also referred to as instance-based learners or rote learners due to their heavy reliance on the training set. Despite the simplicity of lazy learners, such as the k-nearest neighbors approach, they are powerful in dealing with difficult-to-understand data that have a large number of features with a large number of instances of fairly similar class.
By the end of this chapter, you will have learned the following:
- How to quantify the similarity between new and existing data
- How to choose the appropriate number of “neighbors” (k) to use in classifying new data
- How the k-NN classification process works
- How to use the k-NN classifier to assign labels to new data in R
- The strengths and weaknesses of the k-NN method
DETECTING HEART DISEASE
As we explore the nearest neighbors ...
Get Practical Machine Learning in R now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.