Chapter 6. Distributing recommendation computations

This chapter covers

  • Analyzing a massive data set from Wikipedia
  • Producing recommendations with Hadoop and distributed algorithms
  • Pseudo-distributing existing nondistributed recommenders

This book has looked at increasingly large data sets: from 10s of preferences, to 100,000, to 10 million, and then 17 million. But this is still only medium-sized in the world of recommenders. This chapter ups the ante again by tackling a larger data set of 130 million preferences in the form of article-to-article links from Wikipedia’s massive corpus.[1] In this data set, the articles are both the users and the items, which also demonstrates how recommenders can be usefully applied, with Mahout, to less ...

Get Mahout in Action now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.