Charu C. Aggarwal
IBM T. J. Watson Research CenterYorktown Heights, NYcharu@us.ibm.com
The problem of data clustering has been widely studied in the data mining and machine learning literature because of its numerous applications to summarization, learning, segmentation, and target marketing [46, 47, 52]. In the absence of specific labeled information, clustering can be considered a concise model of the data which can be interpreted in the sense of either a summary or a generative model. The basic problem of clustering may be stated as follows:
Given a set of data points, partition them into a set of groups which are as similar as possible.
Note that this is a very rough definition, ...