Chapter 14
Clustering model evaluation
14.1 Introduction
The challenge of reliable model evaluation, discussed for classification and regression models in Chapters 7 and 10, respectively, is similarly important for clustering models. Unlike for the former, though, where there are certain natural quality criteria, for the latter it is not so clear how to assess their quality in an objective way. This results in a much greater number of different performance measures being proposed and used on one hand, and in some considerable reserve with which their outcomes tend to be taken on the other hand.
Even if it is not so widely realized as for more common classification and regression model evaluation, when evaluating clustering models one may also be concerned with their generalization properties. For any performance measure its value on a particular dataset (dataset performance) is therefore a possibly imperfect estimator of the corresponding value on the whole domain (true performance).
Clustering quality measures may, but do not have to, explicitly use instance dissimilarity or similarity measures presented in Chapter 11. Those that do are often applied to evaluate models created by dissimilarity-based clustering algorithms and then it usually makes most sense to adopt the same dissimilarity measure for model creation and evaluation.
14.1.1 Dataset performance
The dataset performance of a clustering model is assessed directly by calculating one or more selected performance ...
Get Data Mining Algorithms: Explained Using R now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.