Chapter 10

Ward’s Method of Cluster Analysis and Principal Components

10.1 Summarizing Data Sets

Data mining, or any form of statistical analysis for that matter, can be viewed as a set of methods to summarize large amounts of data so that we can usefully interpret the data. Collapsing the data by simply grouping it is common and useful. Cluster analysis and principal components are two broad classes of methods for grouping data. Since we will be using both in the tutorials that follow, a brief explanation of the difference is warranted.

Consider a typical data table that has one row for each individual (a row is also called a “case” or “record”). Across the top, we have the names of the variables (or “fields”) which describe the individuals, ...

Get Customer and Business Analytics now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.