Chapter 15. K-Nearest Neighbors
15.0 Introduction
The K-Nearest Neighbors classifier (KNN) is one of the simplest yet most commonly used classifiers in supervised machine learning. KNN is often considered a lazy learner; it doesn’t technically train a model to make predictions. Instead an observation is predicted to be the class of that of the largest proportion of the k nearest observations. For example, if an observation with an unknown class is surrounded by an observation of class 1, then the observation is classified as class 1. In this chapter we will explore how to use scikit-learn to create and use a KNN classifier.
15.1 Finding an Observation’s Nearest Neighbors
Problem
You need to find an observation’s k nearest observations (neighbors).
Solution
Use scikit-learn’s NearestNeighbors
:
# Load libraries
from
sklearn
import
datasets
from
sklearn.neighbors
import
NearestNeighbors
from
sklearn.preprocessing
import
StandardScaler
# Load data
iris
=
datasets
.
load_iris
()
features
=
iris
.
data
# Create standardizer
standardizer
=
StandardScaler
()
# Standardize features
features_standardized
=
standardizer
.
fit_transform
(
features
)
# Two nearest neighbors
nearest_neighbors
=
NearestNeighbors
(
n_neighbors
=
2
)
.
fit
(
features_standardized
)
# Create an observation
new_observation
=
[
1
,
1
,
1
,
1
]
# Find distances and indices of the observation's nearest neighbors
distances
,
indices
=
nearest_neighbors
.
kneighbors
([
new_observation
])
# View the nearest neighbors
features_standardized
[
indices ...
Get Machine Learning with Python Cookbook now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.