Chapter 6. Distributing recommendation computations

This chapter covers

  • Analyzing a massive data set from Wikipedia
  • Producing recommendations with Hadoop and distributed algorithms
  • Pseudo-distributing existing nondistributed recommenders

This book has looked at increasingly large data sets: from 10s of preferences, to 100,000, to 10 million, and then 17 million. But this is still only medium-sized in the world of recommenders. This chapter ups the ante again by tackling a larger data set of 130 million preferences in the form of article-to-article links from Wikipedia’s massive corpus.[1] In this data set, the articles are both the users and the items, which also demonstrates how recommenders can be usefully applied, with Mahout, to less ...

Get Mahout in Action now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.