O'Reilly logo

Cluster Analysis, 5th Edition by Daniel Stahl, Morven Leese, Sabine Landau, Brian S. Everitt

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 7

Model-based Cluster Analysis for Structured Data

7.1 Introduction

In the previous chapter we described how finite mixture models could be used as the basis of a sound statistical approach to cluster analysis. In this chapter we stay with the model-based framework but consider the implications for finite mixture models for clustering data where the subpopulation means and covariance matrices can be described by a reduced set of parameters because of the special nature of the data. This reduction in number of parameters achieved by exploiting the structure of the data helps in two ways:

i. It may lead to more precise parameter estimates and ultimately more informative and more useful cluster analysis solutions.

ii. It may be possible to convincingly fit finite mixture models to smaller data sets; the unstructured models introduced in the last chapter often contain a very large number of parameters, with the consequence that it may only be possible to fit the models using very large samples.

Figure 7.1 illustrates this second point. The general finite mixture model assumes that the variables in each subpopulation follow a multivariate distribution, with different mean vectors and, possibly, different covariance matrices. In the figure the subpopulation is indicated by a latent cluster variable img (latent in the sense that it is not known a priori). The arrows pointing from ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required