Lawrence Berkeley National LaboratoryBerkeley, CAbillandreo@gmail.com
A growing number of clustering algorithms for categorical data have been proposed in recent years, along with interesting applications, such as partitioning large software systems [8, 9] and protein interaction data [13, 21, 38, 77].
A categorical dataset with m attributes is viewed as an m-dimensional “cube”, offering a spatial density basis for clustering. A cell of the cube is mapped to the number of objects having values equal to its coordinates. Clusters in such a cube are regarded as subspaces of high object density and are separated by subspaces of low object density. Clustering the cube poses ...