Affinity propagation – automatically choosing cluster numbers
One of the weaknesses of the k-means algorithm is that we need to define upfront the number of clusters we expect to find in the data. When we are not sure what an appropriate choice is, we may need to run many iterations to find a reasonable value. In contrast, the Affinity Propagation algorithm (Frey, Brendan J., and Delbert Dueck. Clustering by passing messages between data points. science 315.5814 (2007): 972-976.) finds the number of clusters automatically from a dataset. The algorithm takes a similarity matrix as input (S) (which might be the inverse Euclidean distance, for example – thus, closer points have larger values in S), and performs the following steps after initializing ...
Get Mastering Predictive Analytics with Python now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.