Skip to Content
MapReduce Design Patterns
book

MapReduce Design Patterns

by Donald Miner, Adam Shook
December 2012
Intermediate to advanced content levelIntermediate to advanced
247 pages
6h 48m
English
O'Reilly Media, Inc.
Content preview from MapReduce Design Patterns

Chapter 2. Summarization Patterns

Your data is large and vast, with more data coming into the system every day. This chapter focuses on design patterns that produce a top-level, summarized view of your data so you can glean insights not available from looking at a localized set of records alone. Summarization analytics are all about grouping similar data together and then performing an operation such as calculating a statistic, building an index, or just simply counting.

Calculating some sort of aggregate over groups in your data set is a great way to easily extract value right away. For example, you might want to calculate the total amount of money your stores have made by state or the average amount of time someone spends logged into your website by demographic. Typically, with a new data set, you’ll start with these types of analyses to help you gauge what is interesting or unique in your data and what needs a closer look.

The patterns in this chapter are numerical summarizations, inverted index, and counting with counters. They are more straightforward applications of MapReduce than some of the other patterns in this book. This is because grouping data together by a key is the core function of the MapReduce paradigm: all of the keys are grouped together and collected in the reducers. If you emit the fields in the mapper you want to group on as your key, the grouping is all handled by the MapReduce framework for free.

Numerical Summarizations

Pattern Description

The numerical summarizations ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Microservices Patterns

Microservices Patterns

Chris Richardson
Java Concurrency in Practice

Java Concurrency in Practice

Brian Goetz, Tim Peierls, Joshua Bloch, Joseph Bowbeer, David Holmes, Doug Lea
Machine Learning Design Patterns

Machine Learning Design Patterns

Valliappa Lakshmanan, Sara Robinson, Michael Munn

Publisher Resources

ISBN: 9781449341954Errata PageSupplemental Content