Top-k MapReduce implementation  

The top-k reduce algorithm is a popular algorithm in MapReduce. The mappers are responsible for emitting top-k records at its level and then reducer filters out top-k records from all the records it received from the mapper. We will be using an example of player score that we used previously. The objective is to find out top-k players with the lowest score. Let's look onto the mapper implementation. We are assuming that each player has a unique score, otherwise the logic will require a little change, and we need to keep a list of players' details in values and emit only 10 records from the cleanup method. 

The code for TopKMapper can be seen as follows:

import org.apache.Hadoop.io.IntWritable;import org.apache.Hadoop.io.LongWritable; ...

Get Mastering Hadoop 3 now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.