Hierarchical clustering

Hierarchical clustering is a structured clustering approach that results in a multilevel hierarchy of clusters, where each cluster might contain many subclusters (or child clusters). Each child cluster is, thus, linked to the parent cluster. This form of clustering is often also called tree clustering.

Agglomerative clustering is a bottom-up approach where:

  • Each data point begins in its own cluster
  • The similarity (or distance) between each pair of clusters is evaluated
  • The pair of clusters that are most similar are found; this pair is then merged to form a new cluster
  • The process is repeated until only one top-level cluster remains

Divisive clustering is a top-down approach that works in reverse, starting with one ...

Get Machine Learning with Spark - Second Edition now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.