Chapter 9Clustering
Dass mehr Ordnung in die Welt ist, als sie auf den ersten Anblick darbietet, wird erst erkannt, wenn die Ordnung gesucht wird. [That there is more order in the world than presents itself at first sight will only be realized if the order is looked for.]
— Christoph von Sigwart, Logik II, 1878
Do not assume that “clustering” methods are the best way to discover interesting groupings in the data; in our experience the visualization methods are often far more effective. There are many different clustering methods, often giving different answers, and so the danger of over-interpretation is high.
— W. N. Venables and B. D. Ripley, Modern Applied Statistics with Splus, Third Edition, 1999
As described in Chapter 2, the goal of clustering is to learn an unknown, discrete-valued function
without observing its outputs. Since no observed outputs of
are available, no loss function which compares a predicted output to an observed output can be computed. As a result, clustering tends to be driven by algorithms and heuristics rather than by trying to minimize a meaningful measure of risk as is done in classification.
The clustering algorithms presented in this chapter naturally fall into two groups. Algorithms which are applicable when the observed data are in (for example, ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access