Chapter 10

Ward’s Method of Cluster Analysis and Principal Components

10.1 Summarizing Data Sets

Data mining, or any form of statistical analysis for that matter, can be viewed as a set of methods to summarize large amounts of data so that we can usefully interpret the data. Collapsing the data by simply grouping it is common and useful. Cluster analysis and principal components are two broad classes of methods for grouping data. Since we will be using both in the tutorials that follow, a brief explanation of the difference is warranted.

Consider a typical data table that has one row for each individual (a row is also called a “case” or “record”). Across the top, we have the names of the variables (or “fields”) which describe the individuals, ...

Get Customer and Business Analytics now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.