O'Reilly logo

Data Algorithms by Mahmoud Parsian

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 2. Secondary Sort: A Detailed Example

The MapReduce framework sorts input to reducers by key, but values of reducers are arbitrarily ordered. This means that if all mappers generate the following key-value pairs for key = K:

  • (K, V1), (K, V2), ..., (K, Vn)

then all these values {V1, V2, ..., Vn} will be processed by a single reducer (for key = K), but there will be no order (ascending or descending) between instances of Vi. As you learned in Chapter 1, Secondary Sort is a design pattern we can use to apply an order (such as “ascending sort” or “descending sort”) to the values. How do we accomplish this? Say we want to apply some order to the reducer values:

  • S1 ≤ S2 ≤ ... ≤ Sn

or:

  • S1 ≥ S2 ≥ ... ≥ Sn

where Si{V1, V2, ..., Vn} for i = {1, 2, ..., n}. Note that each Vi might be a simple data type, such as String or Integer, or a tuple (more than a single value—that is, a composite object).

There are two ways to sort reducer values:

Solution #1

Buffer reducer values in memory, then sort. If the number of reducer values is small enough to fit in memory (per reducer), then this solution will work. But if the number of reducer values is high, then they might not fit in memory (not a preferable solution). Implementation of this solution is simple; it is presented in Chapter 1 and will not be discussed in this chapter.

Solution #2

Use the Secondary Sort design pattern of the MapReduce framework, and reducer values will arrive sorted (i.e., there’s no need to sort values ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required