Chapter 12. Real-world applications of clustering

This chapter covers

  • Clustering like-minded people on Twitter
  • Suggesting tags for an artist on Last.fm using clustering
  • Creating a related-posts feature for a website

You probably picked up this book to learn and understand how clustering can be applied to real-world problems. So far we’ve mostly focused on clustering the Reuter’s news data set, which had around 20,000 documents, each having about 1,000 to 2,000 words. The size of that data set isn’t challenging enough for Mahout to show its ability to scale. In this chapter, we use clustering to solve three interesting problems on much larger data sets.

First, we attempt to use the public tweets from Twitter (http://twitter.com) to find ...

Get Mahout in Action now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.