Chapter 7

Model-based Cluster Analysis for Structured Data

7.1 Introduction

In the previous chapter we described how finite mixture models could be used as the basis of a sound statistical approach to cluster analysis. In this chapter we stay with the model-based framework but consider the implications for finite mixture models for clustering data where the subpopulation means and covariance matrices can be described by a reduced set of parameters because of the special nature of the data. This reduction in number of parameters achieved by exploiting the structure of the data helps in two ways:

i. It may lead to more precise parameter estimates and ultimately more informative and more useful cluster analysis solutions.

ii. It may be possible to convincingly fit finite mixture models to smaller data sets; the unstructured models introduced in the last chapter often contain a very large number of parameters, with the consequence that it may only be possible to fit the models using very large samples.

Figure 7.1 illustrates this second point. The general finite mixture model assumes that the variables in each subpopulation follow a multivariate distribution, with different mean vectors and, possibly, different covariance matrices. In the figure the subpopulation is indicated by a latent cluster variable img (latent in the sense that it is not known a priori). The arrows pointing from ...

Get Cluster Analysis, 5th Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.