This chapter discusses the circumstances in which the cluster analysis technique can be used. Starting from two observations, different distance (dissimilarity) measures for metric variables and similarity measures for binary variables are calculated. Different hierarchical agglomeration schedules are described, as well as how to interpret dendrograms aiming to allocate the observations to each group. The nonhierarchical k-means agglomeration schedule and its differences in relation to hierarchical schedules will also be studied. Finally, we will develop a cluster analysis in an algebraic manner and by using IBM SPSS Statistics Software and Stata Statistical Software, and then interpret their results.
Get Data Science for Business and Decision Making now with O’Reilly online learning.
O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.