Chapter 5
K-Means Clustering
IN THIS CHAPTER
Mastering k-means clustering
k-means clustering irises in R
k-means clustering the glass
data set
In unsupervised learning, a machine learning (ML) process looks for structure in a data set. The objective is to find patterns, not make predictions. One way to structure a data set is to put the data points into subgroups called clusters. The trick is to find a recipe for creating the clusters. One such recipe is called k-means clustering.
How It Works
To introduce k-means clustering, I show you how to work with the iris
data frame, as I have in previous chapters in this Book. This is the iris
data frame that’s in the base R installation. Fifty flowers in each of three iris species (setosa, versicolor, and virginica) make up the data set. The data frame columns are Sepal.Length
, Sepal.Width
, Petal.Length
, Petal.Width
, and Species
.
For this discussion, you’re concerned with only Petal.Length
, Petal.Width
, and Species
. That way, you can visualize the data in two dimensions.
Figure 5-1 plots the iris
data frame with Petal.Length
on the x-axis, Petal.Width
on the y-axis, and Species
as the color of the plotting character. (For the ggplot ...
Get R All-in-One For Dummies now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.