Connectivity constraints

scikit-learn also allows specifying a connectivity matrix, which can be used as a constraint when finding the clusters to merge. In this way, clusters that are far from one another (non-adjacent in the connectivity matrix) are skipped. A very common method for creating such a matrix involves using the k-Nearest Neighbors (k-NN) graph function (implemented as kneighbors_graph()), which is based on the number of neighbors a sample has (according to a specific metric). In the following example, we consider a circular dummy dataset, without the ground truth:

from sklearn.datasets import make_circlesnb_samples = 3000X, Y = make_circles(n_samples=nb_samples, noise=0.05)

A graphical representation is shown as following: ...

Get Machine Learning Algorithms - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.