7
Cluster Analysis
Introduction
Cluster analysis is the process of grouping observations based on similarity (visually observed as proximity), connectedness, or density. The results of a cluster analysis are called a clustering.
Cluster analysis is similar in concept to the previously discussed process of classification. In classification, the observation groupings (classifications) are known a priori. The objective of classification analysis is to discover relationships between other dataset attributes and the previously known class attribute that could be used to predict class membership. However, in cluster analysis the groupings are not previously known. The objective is the discovery of clusters of observations grouped according to dataset attribute values.
In data mining, there are a number of potential objectives in conducting a cluster analysis.
- Sub-population identification and isolation. As has been discussed in previous chapters, datasets may be composed of observations drawn from populations with different characteristics. Relationships found only in a single subset may not be as readily identified when exploring the full dataset versus just the subset. Hence, a good rule of thumb is to isolate the subsets and then analyze individually. A strategy in product marketing is to first segment the market, then develop specific promotions for selected market segments. The same principle may be applied to data mining – isolate subsets, then develop custom analysis plans for ...