Chapter 9

Clustering High-Dimensional Data

Arthur Zimek

University of Alberta Edmonton,

9.1 Introduction

The general definition of the task of clustering as to find a set of groups of similar objects within a data set while keeping dissimilar objects separated in different groups or the group of noise is very common. Although Estivill-Castro criticizes this definition for including a grouping criterion [47], this criterion (similarity) is exactly what is in question among many different approaches. Especially in high-dimensional data, the meaning and definition of similarity is right at the heart of the problem. In many cases, the similarity of objects is assessed within subspaces, e.g., using a subset of the dimensions ...

Get Data Clustering now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.