
Chapter 6
Clustering from Data Streams
Roughly speaking, clustering is the process of grouping objects into different
groups, such that the common properties of data in each subset are high,
and between different subsets are low. Clustering methods are widely used
in data mining. They are either used to get insight into data distribution or
as a preprocessing step for other algorithms. The most common approaches
use distance between examples as similarity criteria. These approaches require
space that is quadratic in the number of observations, which is prohibitive in
the data stream paradigm.
The data stream clustering problem is defined as to maintain