Chapter 10
K-Means Clustering
IN THIS CHAPTER
Mastering k-means clustering
k-means clustering irises in R
k-means clustering the glass data set
In unsupervised learning, a machine learning (ML) process looks for structure in a data set. The objective is to find patterns, not make predictions. One way to structure a data set is to put the data points into subgroups called clusters. The trick is to find a recipe for creating the clusters. One such recipe is called k-means clustering.
How It Works
To introduce k-means clustering, I show you how to work with the iris data frame, as I have in previous chapters. This is the iris data frame that's in the base R installation. Fifty flowers in each of three iris species (setosa, versicolor, and virginica) make up the data set. The data frame columns are Sepal.Length, Sepal.Width, Petal.Length, Petal.Width, and Species.
For this discussion, you're concerned with only Petal.Length, Petal.Width, and Species. That way, you can visualize the data in two dimensions.
Figure 10-1 plots the iris data frame with Petal.Length on the x-axis, Petal.Width on the y-axis, and Species as the color of the plotting character. (For the ggplot details, see ...