Skip to Main Content
Hadoop Application Architectures
book

Hadoop Application Architectures

by Mark Grover, Ted Malaska, Jonathan Seidman, Gwen Shapira
July 2015
Intermediate to advanced content levelIntermediate to advanced
250 pages
10h 47m
English
O'Reilly Media, Inc.
Content preview from Hadoop Application Architectures

Chapter 3. Processing Data in Hadoop

In the previous chapters we’ve covered considerations around modeling data in Hadoop and how to move data in and out of Hadoop. Once we have data loaded and modeled in Hadoop, we’ll of course want to access and work with that data. In this chapter we review the frameworks available for processing data in Hadoop.

With processing, just like everything else with Hadoop, we have to understand the available options before deciding on a specific framework. These options give us the knowledge to select the correct tool for the job, but they also add confusion for those new to the ecosystem. This chapter is written with the goal of giving you the knowledge to select the correct tool based on your specific use cases.

We will open the chapter by reviewing the main execution engines—the frameworks directly responsible for executing data processing tasks on Hadoop clusters. This includes the well-established MapReduce framework, as well as newer options such as data flow engines like Spark.

We’ll then move to higher-level abstractions such as Hive, Pig, Crunch, and Cascading. These tools are designed to provide easier-to-use abstractions over lower-level frameworks such as MapReduce.

For each processing framework, we’ll provide:

  • An overview of the framework

  • A simple example using the framework

  • Rules for when to use the framework

  • Recommended resources for further information on the framework

After reading this chapter, you will gain an understanding ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Practical Hadoop Migration: How to Integrate Your RDBMS with the Hadoop Ecosystem and Re-Architect Relational Applications to NoSQL

Practical Hadoop Migration: How to Integrate Your RDBMS with the Hadoop Ecosystem and Re-Architect Relational Applications to NoSQL

Bhushan Lakhe
Modern Big Data Processing with Hadoop

Modern Big Data Processing with Hadoop

Manoj R Patil, V Naresh Kumar, Prashant Shindgikar
Architecting HBase Applications

Architecting HBase Applications

Jean-Marc Spaggiari, Kevin O'Dell

Publisher Resources

ISBN: 9781491910313Errata Page