Skip to Content
MapReduce Design Patterns
book

MapReduce Design Patterns

by Donald Miner, Adam Shook
December 2012
Intermediate to advanced content levelIntermediate to advanced
247 pages
6h 48m
English
O'Reilly Media, Inc.
Content preview from MapReduce Design Patterns

Chapter 4. Data Organization Patterns

In contrast to the previous chapter on filtering, this chapter is all about reorganizing data. The value of individual records is often multipled by the way they are partitioned, sharded, or sorted. This is especially true in distributed systems, where partitioning, sharding, and sorting can be exploited for performance.

In many organizations, Hadoop and other MapReduce solutions are only a piece in the larger data analysis platform. Data will typically have to be transformed in order to interface nicely with the other systems. Likewise, data might have to be transformed from its original state to a new state to make analysis in MapReduce easier.

This chapter contains several pattern subcategories as you will see in each pattern description:

  • The structured to hierarchical pattern

  • The partitioning and binning patterns

  • The total order sorting and shuffling patterns

The patterns in this chapter are often used together to solve data organization problems. For example, you may want to restructure your data to be hierarchical, bin the data, and then have the bins be sorted. See Job Chaining in Chapter 6 for more details on how to tackle the problem of combining patterns together to solve more complex problems.

Structured to Hierarchical

Pattern Description

The structured to hierarchical pattern creates new records from data that started in a very different structure. Because of its importance, this pattern in many ways stands alone in the chapter.

Intent

Transform ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Microservices Patterns

Microservices Patterns

Chris Richardson
Java Concurrency in Practice

Java Concurrency in Practice

Brian Goetz, Tim Peierls, Joshua Bloch, Joseph Bowbeer, David Holmes, Doug Lea
Machine Learning Design Patterns

Machine Learning Design Patterns

Valliappa Lakshmanan, Sara Robinson, Michael Munn

Publisher Resources

ISBN: 9781449341954Errata PageSupplemental Content