Chapter 8

Graph-based Clustering

8.1 Introduction

Graph-based clustering or network clustering is the process of clustering data or

objects represented as a graph. Objects or data points are shown by the vertices of

the graph and there is an edge between the objects if they are related according to

some metric. In data clustering, we did not require edges between the objects to be

clustered whereas in graph clustering, we will specify the neighbors of an entity and

these neighbors will have a relation to the object. We will distinguish two cases of

clustering where the objects belong to the same cluster if they are closer to each

other based on a given objective function in distance-based clustering; or conceptual

clustering where clusters are formed based on the description of the objects. We will

assume the former and also the graph representing the network is simple, undirected,

weighted or unweighted.

Another distinction is whether the clusters formed overlap or not. In graph parti-

tioning, clusters do not overlap, that is, they do not have common members whereas

they may overlap in graph clustering. Whether the number of clusters is determined

beforehand or not also affects the design of clustering algorithms. If the number

of clusters is not known a priori, we need to specify an objective function to test

and when this function reaches a desired value, the algorithm terminates. There are

various parameters called validation indices to evaluate the quality (goodness) of a

graph-based clustering. One such measure which is also applicable in data clustering

is based on the intra-cluster distance which is the average distance of the vertices to

the cluster center c

i

deﬁned as below :

d

intra

=

1

n

k

X

i=1

X

v∈C

i

d(v,c

i

) (8.1)

143

Start Free Trial

No credit card required