Unsupervised cluster analysis refers to algorithms that aim at producing homogeneous groups of cases from unlabeled data. The algorithm doesn't know beforehand what the membership to the groups is, and its goal is to find the structure of the data from similarities (or differences) between the cases; a cluster is a group of cases, observations, individuals, or other units, that are similar to each other on the considered characteristics. These characteristics can be anything measurable or observable. The choice of characteristics, or attributes, is important as different attributes will lead to different clusters.
In this chapter, we will discuss the following topics: