Chapter 5

K-Means Clustering

IN THIS CHAPTER

Bullet Mastering k-means clustering

Bullet k-means clustering irises in R

Bullet k-means clustering the glass data set

In unsupervised learning, a machine learning (ML) process looks for structure in a data set. The objective is to find patterns, not make predictions. One way to structure a data set is to put the data points into subgroups called clusters. The trick is to find a recipe for creating the clusters. One such recipe is called k-means clustering.

How It Works

To introduce k-means clustering, I show you how to work with the iris data frame, as I have in previous chapters in this Book. This is the iris data frame that’s in the base R installation. Fifty flowers in each of three iris species (setosa, versicolor, and virginica) make up the data set. The data frame columns are Sepal.Length, Sepal.Width, Petal.Length, Petal.Width, and Species.

For this discussion, you’re concerned with only Petal.Length, Petal.Width, and Species. That way, you can visualize the data in two dimensions.

Figure 5-1 plots the iris data frame with Petal.Length on the x-axis, Petal.Width on the y-axis, and Species as the color of the plotting character. (For the ggplot ...

Get R All-in-One For Dummies now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.