Evidence accumulation

As a basic ensemble, we can first cluster the data many times and record the labels from each run. We then record how many times each pair of samples was clustered together in a new matrix. This is the essence of the Evidence Accumulation Clustering (EAC) algorithm.

EAC has two major steps.

  1. The first step is to cluster the data many times using a lower-level clustering algorithm, such as k-means and record the frequency that samples were in the same cluster, in each iteration. This is stored in a co-association matrix.
  2. The second step is to perform a cluster analysis on the resulting co-association matrix, which is performed using another type of clustering algorithm called hierarchical clustering. This has an interesting ...

Get Learning Data Mining with Python - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.