scikit-learn also allows specifying a connectivity matrix, which can be used as a constraint when finding the clusters to merge. In this way, clusters that are far from one another (non-adjacent in the connectivity matrix) are skipped. A very common method for creating such a matrix involves using the k-Nearest Neighbors (k-NN) graph function (implemented as kneighbors_graph()), which is based on the number of neighbors a sample has (according to a specific metric). In the following example, we consider a circular dummy dataset, without the ground truth:
from sklearn.datasets import make_circlesnb_samples = 3000X, Y = make_circles(n_samples=nb_samples, noise=0.05)
A graphical representation is shown as following: ...