O'Reilly logo

Data Clustering by Chandan K. Reddy, Charu C. Aggarwal

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 11

Big Data Clustering

Hanghang Tong

IBM T. J. Watson Research CenterYorktown Heights, NYhtong@us.ibm.com

U Kang

KAISTSeoul, Koreaukang@cs.cmu.edu

11.1 Introduction

With the advance of Web2.0, the data size is increasing explosively. For example, Twitter data spans several terabytes; Wikipedia data (e.g., articles and authors) is of similar size; web click-through data is reported to reach petabyte scale [36]; Yahoo! web graph in 2002 has more than 1 billion nodes and almost 7 billion edges [27].

On the other hand, many data clustering algorithms have a high intrinsic time complexity. For example, the classic k-means clustering is NP-hard even when k = 2. The normalized cut (NCut), a representative spectral clustering algorithm, is also ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required