13.5. Hierarchical Algorithms for Large Data Sets

As we have seen in Section 13.2 the number of operations for the generalized agglomerative scheme (GAS) is of the order of N3, and this cannot become less than O(N2), even if efficient computational schemes are employed. This section is devoted to a special type of hierarchical algorithms that are most appropriate for handling large data sets. As it has been stated elsewhere, the need for such algorithms stems from a number of applications, such as Web mining, bioinformatics, and so on.

The CURE Algorithm

The acronym CURE stands for Clustering Using REpresentatives. The innovative feature of CURE is that it represents each cluster, C, by a set of k > 1 representatives, denoted by Rc. By using ...

Get Pattern Recognition, 4th Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.