3Dissimilarity, Similarity, and Distance Measures

Many clustering techniques use dissimilarity, similarity, or distance measures as a basis for forming appropriate clusters. Their role in clustering is covered in Chapters –8. In this chapter, we first consider some general features of these measures. We then go on to discuss these measures in detail for two types of symbolic data, namely, non‐modal multi‐valued (i.e., list) data and interval data, and in Chapter we give the details for measures for modal valued observations, both modal multi‐valued (or modal list) data and histogram‐valued data. In these cases, all variables are assumed to be of the same type (e.g., all interval‐valued). However, this is not a necessary restriction, as seen in the mixed data examples of later chapters (e.g., Example 7.14).

3.1 Some General Basic Definitions

We start with a set of objects images. The aim in clustering is to form clusters such that those objects that are most alike are clustered together and those that are most unalike are in different clusters. Thus, the notion of a dissimilarity measure, images, is to calculate how much two particular objects, images and , are not alike, or, if we want to calculate ...

Get Clustering Methodology for Symbolic Data now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.