Chapter 10Hierarchical and k-Means Clustering

  1. 10.1 The Clustering Task
  2. 10.2 Hierarchical Clustering Methods
  3. 10.3 Single-Linkage Clustering
  4. 10.4 Complete-Linkage Clustering
  5. 10.5 k-Means Clustering
  6. 10.6 Example of k-Means Clustering at Work
  7. 10.7 Behavior of MSB, MSE, and PSEUDO-F as the k-Means Algorithm Proceeds
  8. 10.8 Application of k-Means Clustering Using SAS Enterprise Miner
  9. 10.9 Using Cluster Membership to Predict Churn
    1. The R Zone
    2. References
    3. Exercises
    4. Hands-On Analysis

10.1 The Clustering Task

Clustering refers to the grouping of records, observations, or cases into classes of similar objects. A cluster is a collection of records that are similar to one another and dissimilar to records in other clusters. Clustering differs from classification in that there is no target variable for clustering. The clustering task does not try to classify, estimate, or predict the value of a target variable. Instead, clustering algorithms seek to segment the entire data set into relatively homogeneous subgroups or clusters, where the similarity of the records within the cluster is maximized, and the similarity to records outside this cluster is minimized.

For example, the Nielsen PRIZM segments, developed by Claritas, Inc., represent demographic profiles of each geographic area in the United States, in terms of distinct lifestyle types, as defined by zip code. For example, the clusters identified for zip code 90210, Beverly Hills, California, are

  • Cluster # 01: Upper Crust Estates
  • Cluster ...

Get Discovering Knowledge in Data: An Introduction to Data Mining, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience live online training, plus books, videos, and digital content from nearly 200 publishers.