R: Data Analysis and Visualization
by Tony Fischetti, Brett Lantz, Jaynal Abedin, Hrishi V. Mittal, Bater Makhabel, Edina Berlinger, Ferenc Illés, Milán Badics, Ádám Banai, Gergely Daróczi, Barbara Dömötör, Gergely Gabler, Dániel Havran, Péter Juhász, István Margitai, Balázs Márkus, Péter Medvegyev, Julia Molnár, Balázs Árpád Szucs, Ágnes Tuza, Tamás Vadász, Kata Váradi, Ágnes Vidovics-Dancs
The CLARA algorithm
Instead of taking the whole set of data into consideration, the CLARA (Clustering LARge Application) algorithm randomly chooses a small portion of the actual data as a representative of the data. Medoids are then chosen from this sample using PAM. If the sample is selected in a fairly random manner, it should closely represent the original dataset.
CLARA draws multiple samples of the dataset, applies PAM to each sample, finds the medoids, and then returns its best clustering as the output. At first, a sample dataset D' is drawn from the original dataset D, and the PAM algorithm is applied to D' to find the k medoids. Use these k medoids and the dataset D to calculate the current dissimilarity. If it is smaller than the one you ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access