The MapReduce framework sorts input to reducers by key, but values of reducers are arbitrarily ordered. This means that if all mappers generate the following key-value pairs for `key`

= `K`

:

`(K, V`

_{1}), (K, V_{2}), ..., (K, V_{n})

then all these values {`V`

, _{1}`V`

, ..., _{2}`V`

} will be processed by a single reducer (for _{n}`key`

= `K`

), but there will be no order (ascending or descending) between instances of `V`

. As you learned in Chapter 1, Secondary Sort is a design pattern we can use to apply an order (such as “ascending sort” or “descending sort”) to the values. How do we accomplish this? Say we want to apply some order to the reducer values:_{i}

`S`

_{1}≤ S_{2}≤ ... ≤ S_{n}

or:

`S`

_{1}≥ S_{2}≥ ... ≥ S_{n}

where `S`

∈ _{i}`{V`

for _{1}, V_{2}, ..., V_{n}}`i = {1, 2, ..., n}`

. Note that each `V`

might be a simple data type, such as _{i}`String`

or `Integer`

, or a tuple (more than a single value—that is, a composite object).

There are two ways to sort reducer values:

- Solution #1
Buffer reducer values in memory, then sort. If the number of reducer values is small enough to fit in memory (per reducer), then this solution will work. But if the number of reducer values is high, then they might not fit in memory (not a preferable solution). Implementation of this solution is simple; it is presented in Chapter 1 and will not be discussed in this chapter.

- Solution #2
Use the Secondary Sort design pattern of the MapReduce framework, and reducer values will arrive sorted (i.e., there’s no need to sort values ...

Start Free Trial

No credit card required