Chapter 12

Clustering Categorical Data

Bill Andreopoulos

Lawrence Berkeley National LaboratoryBerkeley, CAbillandreo@gmail.com

12.1 Introduction

A growing number of clustering algorithms for categorical data have been proposed in recent years, along with interesting applications, such as partitioning large software systems [8, 9] and protein interaction data [13, 21, 38, 77].

A categorical dataset with m attributes is viewed as an m-dimensional “cube”, offering a spatial density basis for clustering. A cell of the cube is mapped to the number of objects having values equal to its coordinates. Clusters in such a cube are regarded as subspaces of high object density and are separated by subspaces of low object density. Clustering the cube poses ...

Get Data Clustering now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.