Skip to Main Content
Data Algorithms
book

Data Algorithms

by Mahmoud Parsian
July 2015
Intermediate to advanced content levelIntermediate to advanced
778 pages
17h 9m
English
O'Reilly Media, Inc.
Content preview from Data Algorithms

Chapter 5. Order Inversion

The main focus of this chapter is the Order Inversion (OI) design pattern, which can be used to control the order of reducer values in the MapReduce framework (which is useful because some computations require ordered data). Typically, the OI pattern is applied during the data analysis phase. In Hadoop and Spark, the order of values arriving at a reducer is undefined (there is no order unless we exploit the sorting phase of MapReduce to push the data needed for calculations to the reducer). The OI pattern works for pair patterns, which use simpler data structures and require less reducer memory, due to the fact that there is no additional sorting and ordering of reducer values in the reducer phase.

To help you understand the OI pattern, we’ll start with a simple example. Consider a reducer with a composite key of (K1, K2) and assume that K1 is the natural key component of the composite key. Say this reducer receives the following values (there is no ordering between these values):

  • V1, V2, ..., Vn

By implementing the OI pattern, we can sort and classify the values arriving at the reducer with a key of (K1, K2). The sole purpose of using the OI pattern is to properly sequence data presented to the reducer. To demonstrate the OI design pattern, we’ll assume that K1 is a fixed part of the composite key and that K2 has only three (this can be any number) distinct values, {K2a, K2b, K2c}, which generate the values shown in Table 5-1. (Note that we have to ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Data Algorithms with Spark

Data Algorithms with Spark

Mahmoud Parsian
Algorithms and Data Structures for Massive Datasets

Algorithms and Data Structures for Massive Datasets

Dzejla Medjedovic, Emin Tahirovic, Ines Schweigert
Data Mesh

Data Mesh

Zhamak Dehghani
Learning Algorithms

Learning Algorithms

George Heineman

Publisher Resources

ISBN: 9781491906170Errata PageSupplemental Content