Chapter 7

Model-based Cluster Analysis for Structured Data

7.1 Introduction

In the previous chapter we described how finite mixture models could be used as the basis of a sound statistical approach to cluster analysis. In this chapter we stay with the model-based framework but consider the implications for finite mixture models for clustering data where the subpopulation means and covariance matrices can be described by a reduced set of parameters because of the special nature of the data. This reduction in number of parameters achieved by exploiting the structure of the data helps in two ways:

i. It may lead to more precise parameter estimates and ultimately more informative and more useful cluster analysis solutions.

ii. It may be possible to convincingly fit finite mixture models to smaller data sets; the unstructured models introduced in the last chapter often contain a very large number of parameters, with the consequence that it may only be possible to fit the models using very large samples.

Figure 7.1 illustrates this second point. The general finite mixture model assumes that the variables in each subpopulation follow a multivariate distribution, with different mean vectors and, possibly, different covariance matrices. In the figure the subpopulation is indicated by a latent cluster variable (latent in the sense that it is not known a priori). The arrows pointing from ...