Measurement of Proximity
Of central importance in attempting to identify clusters of observations which may be present in data is knowledge on how ‘close’ individuals are to each other, or how far apart they are. Many clustering investigations have as their starting point an n × n one-mode matrix, the elements of which reflect, in some sense, a quantitative measure of closeness, more commonly referred to in this context as dissimilarity, distance or similarity, with a general term being proximity. Two individuals are ‘close’ when their dissimilarity or distance is small or their similarity large. Proximities can be determined either directly or indirectly, although the latter is more common in most applications. Directly determined proximities are illustrated by the cola tasting experiment described in Chapter 1; they occur most often in areas such as psychology and market research.
Indirect proximities are usually derived from the n × p multivariate (two-mode) matrix, X, introduced in Chapter 1. There is a vast range of possible proximity measures, many of which we will meet in this chapter. But as an initial example, Table 3.1 shows data concerning crime rates of seven offences (p = 7) for 16 cities in the USA (n = 16), with an accompanying dissimilarity matrix, the elements of which are calculated as the Euclidean distances between cities (see Section 3.3) after scaling each crime variable to unit variance (a technique that will be discussed in Section ...