These days there is no lack of clustering methods. The novice in search of an appro pri-
ate method for his or her classifica tion problem is confronted with a confusing multitude.
Many methods are heuris tic, and c lus tering data is often considered an algorithmic activity
of geometric and exploratory nature. In Iven Van Mechelen’s addr e ss at the biannual 20 13
conference of the International Federation of Classification Societies, in Tilburg, the Nether-
lands, he was even speaking of a “ c lassification jungle.” While there is no s ingle method
that will solve all kinds of clustering problems, this is no excuse not to look for methods
that cover broad ranges of applications.
Data may often be perceived as realizations of random variables. In these cases, it is
best to start from a probabilistic model of the data, deriving from it estimators and criteria
by application of well-founded inferential methods. Probabilistic cluster analysis has a long
history dating back to Newcomb [387] 1886, Karl Pearson [404] 1894 and Charlier [86]
1906. These authors were far ahead of their time. It was only in the second ha lf of the
twentieth century that cluster analysis picked up speed. Yet the design of automatic methods
is still considered a true challenge, in particular, in view of real applications. Still, the
field doesn’t have a good reputation among practitioners. Dougherty and Brun [131] write:
Although us ed for many years, data clustering has remained highly problematic and at a
very deep level. Earlier, Milligan [376], p. 358, had complained about the poor performance
of clustering methods bas e d on multivariate, normal mixture models. He conjectured that
the crux might be model over-parameterization. However, this is not the po int. Identifiability
shows that all parameters in a normal mixture are actually needed. One reason is the
sheer multitude of local solutions. Another is the possible existence of multiple reasonable
solutions. A third is s e nsitivity to contamination. In the meantime, methods have been
developed that remedy these shortcomings.
The pre sent text presents a probabilistic approach to robust mixture and cluster analysis
with likelihood-based methods. It thus views mixture and cluster ana ly sis as an instance of
statistical inference. The relevant material is spread across various jour nals, over a broad
time frame, and deserves to be unified in one volume. This book has a spec ial focus on
robustness. Of cours e , it isn’t possible to write about robust cluster analysis without writing
about cluster analysis itself. Despite the book’s genera l title, it is by no means intended as
an overview of the literature on cluster analysis. Other authors have served this purpose.
Instead, the focus is on the methods that the author found to be most useful on simulated
data and real applications. Different applications have different needs. The analysis of gene
expression data, for instance, requires mainly accuracy. Real-time applications such as the
analysis of magnetic resonance images also require spe e d. These are antagonistic assets.
Although some remarks on speed will be made in Sections 3.1 .5 and 3.1.6, the main emphasis
will be on accuracy.
The theoretical parts of this book are written in the “definition theorem proof
style, which has been customary in mathematics for a long time now. This is intended to
facilitate reading. It makes it easier to follow the main theory without having to plunge
deeply into details, such as proofs, that ar e not essential for a basic understanding. The
alternative “novel style” makes it difficult to highlight the important issues, which are the
definitions and theorems that make up the basis of the theo ry, and the resulting methods.

Get Robust Cluster Analysis and Variable Selection now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.