IN THIS CHAPTER
k-means clustering irises in R
k-means clustering the
glass data set
In unsupervised learning, a machine learning (ML) process looks for structure in a data set. The objective is to find patterns, not make predictions. One way to structure a data set is to put the data points into subgroups called clusters. The trick is to find a recipe for creating the clusters. One such recipe is called k-means clustering.
To introduce k-means clustering, I show you how to work with the
iris data frame, as I have in previous chapters. This is the
iris data frame that's in the base R installation. Fifty flowers in each of three iris species (setosa, versicolor, and virginica) make up the data set. The data frame columns are
For this discussion, you're concerned with only
Species. That way, you can visualize the data in two dimensions.
Figure 10-1 plots the
iris data frame with
Petal.Length on the x-axis,
Petal.Width on the y-axis, and
Species as the color of the plotting character. (For the
ggplot details, see ...