O'Reilly logo

R Projects For Dummies by Joseph Schmuller

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 10

K-Means Clustering

IN THIS CHAPTER

check Mastering k-means clustering

check k-means clustering irises in R

check k-means clustering the glass data set

In unsupervised learning, a machine learning (ML) process looks for structure in a data set. The objective is to find patterns, not make predictions. One way to structure a data set is to put the data points into subgroups called clusters. The trick is to find a recipe for creating the clusters. One such recipe is called k-means clustering.

How It Works

To introduce k-means clustering, I show you how to work with the iris data frame, as I have in previous chapters. This is the iris data frame that's in the base R installation. Fifty flowers in each of three iris species (setosa, versicolor, and virginica) make up the data set. The data frame columns are Sepal.Length, Sepal.Width, Petal.Length, Petal.Width, and Species.

For this discussion, you're concerned with only Petal.Length, Petal.Width, and Species. That way, you can visualize the data in two dimensions.

Figure 10-1 plots the iris data frame with Petal.Length on the x-axis, Petal.Width on the y-axis, and Species as the color of the plotting character. (For the ggplot details, see ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required