Chapter 9

Some Final Comments and Guidelines

9.1 Introduction

It should by now be obvious to most readers that the use of cluster analysis in practice does not involve simply the application of one particular technique to the data under investigation, but rather necessitates a series of steps, each of which may be dependent on the results of the preceding one. It is generally impossible a priori to anticipate what combination of variables, similarity measures and clustering techniques is likely to lead to interesting and informative classifications. Consequently, the analysis proceeds through several stages, with the researcher intervening if necessary to alter variables, choose a different similarity measure, concentrate on a particular subset of individuals, and so on. The final, extremely important stage concerns the evaluation of the clustering solutions obtained. Are the clusters real or merely artefacts of the algorithms? Do other solutions exist which are better? Can the clusters be given a convincing interpretation? A long list of such questions (which are full of traps for the unwary; see Dubes and Jain, 1979) might be posed.

It should also be clear from the preceding chapters that no one clustering method can be judged to be ‘best’ in all circumstances. The various studies that have compared a variety of clustering procedures on artificially generated data all point to the existence of a ‘method × data type’ interaction. In other words, particular methods will be best ...

Get Cluster Analysis, 5th Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.