Chapter 5. Order Inversion
The main focus of this chapter is the Order Inversion (OI) design pattern, which can be used to control the order of reducer values in the MapReduce framework (which is useful because some computations require ordered data). Typically, the OI pattern is applied during the data analysis phase. In Hadoop and Spark, the order of values arriving at a reducer is undefined (there is no order unless we exploit the sorting phase of MapReduce to push the data needed for calculations to the reducer). The OI pattern works for pair patterns, which use simpler data structures and require less reducer memory, due to the fact that there is no additional sorting and ordering of reducer values in the reducer phase.
To help you understand the OI pattern, we’ll start with a simple example. Consider a reducer with a composite key of (K1, K2) and assume that K1 is the natural key component of the composite key. Say this reducer receives the following values (there is no ordering between these values):
V1, V2, ..., Vn
By implementing the OI pattern, we can sort and classify the values arriving at the reducer with a key of (K1, K2). The sole purpose of using the OI pattern is to properly sequence data presented to the reducer. To demonstrate the OI design pattern, we’ll assume that K1 is a fixed part of the composite key and that K2 has only three (this can be any number) distinct values, {K2a, K2b, K2c}, which generate the values shown in Table 5-1. (Note that we have to ...
Get Data Algorithms now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.