Hierarchical clustering

Hierarchical clustering is a structured clustering approach that results in a multilevel hierarchy of clusters, where each cluster might contain many subclusters (or child clusters). Each child cluster is, thus, linked to the parent cluster. This form of clustering is often also called tree clustering.

Agglomerative clustering is a bottom-up approach where:

  • Each data point begins in its own cluster
  • The similarity (or distance) between each pair of clusters is evaluated
  • The pair of clusters that are most similar are found; this pair is then merged to form a new cluster
  • The process is repeated until only one top-level cluster remains

Divisive clustering is a top-down approach that works in reverse, starting with one ...

Get Machine Learning with Spark - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.