Skip to Content
Data Algorithms
book

Data Algorithms

by Mahmoud Parsian
July 2015
Intermediate to advanced
778 pages
17h 9m
English
O'Reilly Media, Inc.
Content preview from Data Algorithms

Chapter 28. MapReduce and Monoids

Introduction

This chapter is based on Jimmy Lin’s paper[15] titled “Monoidify! Monoids as a Design Principle for Efficient MapReduce Algorithms.” Lin clearly introduces monoids as a design principle for efficient MapReduce algorithms. But what is a monoid, what properties define it, and how does it aid the MapReduce paradigm? Lin shows that when your MapReduce operations are not monoids, it is very hard to use combiners efficiently.

Also, David Saile[13] states that:

A monoid is an algebraic structure with a single associative binary operation and an identity element. For example, the natural numbers1 N form a monoid under addition with identity element zero. In classic MapReduce, the mapper is not constrained, but the reducer is required to be (the iterated application of) an associative operation. Recent research argued that reduction is in fact monoidal in known applications of MapReduce. That is, reduction is indeed the iterated application of an associative operation “•” with a unit u. In the case of the word-occurrence count example, reduction iterates addition “+” with “0” as unit. The parallel execution schedule may be more flexible if commutativity is required in addition to associativity.

A detailed analysis of common MapReduce computations on the basis of monoids can be found in [5]. Next, we will briefly review MapReduce’s combiners and abstract algebra’s monoids and see how they are related to each other. Converting nonassociative ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Data Algorithms with Spark

Data Algorithms with Spark

Mahmoud Parsian
Graph Algorithms

Graph Algorithms

Mark Needham, Amy E. Hodler
Algorithms and Data Structures for Massive Datasets

Algorithms and Data Structures for Massive Datasets

Dzejla Medjedovic, Emin Tahirovic, Ines Schweigert

Publisher Resources

ISBN: 9781491906170Errata PageSupplemental Content