Hadoop in Practice

Chapter 12. Crunch and other technologies

This chapter covers

An exploration of Crunch basics
Using Crunch for data analysis
A comparison of Crunch and Cascading

Up until now we’ve looked at Pig and Hive, which are high-level MapReduce abstractions. Our final foray into MapReduce abstractions is Crunch, which is a Java library that makes it easy to write and execute MapReduce jobs. Much like Pig, it’s a pipeline-based framework but, because it’s a Java library, it offers a higher level of flexibility than you get with Pig.

Crunch is compelling in that it allows you to model MapReduce pipelines in Java without having to use MapReduce constructs such as Map/Reduce functions or Writables. Crunch also benefits from not forcing its own type ...

Get Hadoop in Practice now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Hadoop in Practice by Alex Holmes

Chapter 12. Crunch and other technologies

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly