IBM T. J. Watson Research CenterYorktown Heights, NYhtong@us.ibm.com
With the advance of Web2.0, the data size is increasing explosively. For example, Twitter data spans several terabytes; Wikipedia data (e.g., articles and authors) is of similar size; web click-through data is reported to reach petabyte scale ; Yahoo! web graph in 2002 has more than 1 billion nodes and almost 7 billion edges .
On the other hand, many data clustering algorithms have a high intrinsic time complexity. For example, the classic k-means clustering is NP-hard even when k = 2. The normalized cut (NCut), a representative spectral clustering algorithm, is also ...