Chapter 6

Finite Mixture Densities as Models for Cluster Analysis

6.1 Introduction

The majority of cluster criteria discussed in Chapters 4 and 5 were heuristic in the sense that assumptions about the class structure were not explicitly stated. But this does not necessarily mean that such assumptions are not made, so that the different methods for producing the clusters and for determining the number of clusters may often give conflicting solutions. Procedures for deciding upon a final cluster solution when using these methods were, in general, informal and subjective. In Chapter 5 some rather more formalized methods based on the optimization of numerical criteria were described, but these methods still relied on rather ad hoc approaches when, for example, deciding on the number of clusters.

In this chapter we introduce an alternative approach to clustering which postulates a formal statistical model for the population from which the data are sampled, a model that assumes that this population consists of a number of subpopulations (the ‘clusters’) in each of which the variables have a different multivariate probability density function, resulting in what is known as a finite mixture density for the population as a whole. By using finite mixture densities as models for cluster analysis, the clustering problem becomes that of estimating the parameters of the assumed mixture and then using the estimated parameters to calculate the posterior probabilities of cluster membership. And ...