Clustering and classification are two related activities sometimes used as synonyms. In clustering, the goal is to identify in a given set of units, groups (clusters, classes) of (usually) similar units. In classification a given unit has to be assigned to the corresponding (predefined) group. These two activities are embedded in our language and are therefore basic for most of our daily tasks.
The earliest classification systems were taxonomies of animals and plants: Shen Nung, China, 3000 BCE and Ebers Papyrus, Egypt, 1500 BCE. A theoretical framework was proposed by Aristotle (384–322 BCE). The taxonomic systems were improved by Linnaeus (1707–1778), Darwin (1809–1882), DNA (1953), and PhyloCode (1998).
The first steps towards “numeric” clustering procedures were taken in the first half of 20th century by defining different (dis)similarity measures such as Czekanowski coefficient (1909), coefficient of racial likeness (Pearson, 1926), generalized distance (Mahalanobis, 1936), etc. Early methods were proposed inside biometrics and psychometrics by Driver and Kroeber (1932), Forbes (1933), Zubin (1938), Sturtevant (1939), etc. Kruskal's ...