CLARA and CLARANS: k-Medoids Algorithms for Large Data Sets

These algorithms are versions of the PAM algorithm, and they have been developed to cope with the high computational demands imposed by large data sets. Both algorithms exploit the idea of randomized sampling but each in a different way. Specifically, the idea underlying CLARA is to draw randomly a sample X′ of size N′ from the entire data set, X, and to determine the set Θ′ of the medoids that best represents X′ using the PAM algorithm. The rationale behind this algorithm is based on the assumption that if the sample X′ is drawn in a way that is representative of the statistical distribution of the data points in X the set Θ′ will be a satisfactory approximation of the set Θ of the ...

Get Pattern Recognition, 4th Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.