In Chapter 11, we introduced association rules, the first of the two unsupervised machine learning approaches that we cover in this book. In that approach, the objective was to develop a set of rules that describe the patterns that exist between events or items in a transaction set. In this chapter, we introduce the second unsupervised machine learning approach—clustering. With clustering, the objective is to find interesting ways to group items based on some measure of similarity. There are several real-world applications of clustering. Most often we see clustering applied to problems such as customer segmentation based on demographics or purchase behavior and anomalous network activity detection. As part of our discussion on clustering, we will introduce the basic idea behind clustering, discuss the different ways to describe approaches to clustering, explore the mechanics of a common clustering algorithm (-means clustering), and illustrate how to cluster data in R using the -means clustering algorithm.
By the end of this chapter, you will have learned the following:
- The basic idea behind clustering as an unsupervised machine learning approach
- How the -means clustering algorithm works
- How to segment data using the -means algorithm in R ...