1Introduction

The theme of this volume centers on clustering methodologies for data which allow observations to be described by lists, intervals, histograms, and the like (referred to as “symbolic” data), instead of single point values (traditional “classical” data). Clustering techniques are frequent participants in exploratory data analyses when the goal is to elicit identifying classes in a data set. Often these classes are in and of themselves the goal of an analysis, but they can also become the starting point(s) of subsequent analyses. There are many texts available which focus on clustering for classically valued observations. This volume aims to provide one such outlet for symbolic data.

With the capabilities of the modern computer, large and extremely large data sets are becoming more routine. What is less routine is how to analyze these data. Data sets are becoming so large that even with the increased computational power of today, direct analyses through the myriad of classical procedures developed over the past century alone are not possible; for example, from Stirling's formula, the number of partitions of a data set of only 50 units is approximately images. As a consequence, subsets of aggregated data are determined for subsequent analyses. Criteria for how and the directions taken in these aggregations would typically be driven by the underlying scientific questions pertaining ...

Get Clustering Methodology for Symbolic Data now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.