Chapter 11

Cluster Analysis

Abstract

This chapter discusses the circumstances in which the cluster analysis technique can be used. Starting from two observations, different distance (dissimilarity) measures for metric variables and similarity measures for binary variables are calculated. Different hierarchical agglomeration schedules are described, as well as how to interpret dendrograms aiming to allocate the observations to each group. The nonhierarchical k-means agglomeration schedule and its differences in relation to hierarchical schedules will also be studied. Finally, we will develop a cluster analysis in an algebraic manner and by using IBM SPSS Statistics Software and Stata Statistical Software, and then interpret their results.

Get Data Science for Business and Decision Making now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.