Chapter 12. Crunch and other technologies


This chapter covers
  • An exploration of Crunch basics
  • Using Crunch for data analysis
  • A comparison of Crunch and Cascading


Up until now we’ve looked at Pig and Hive, which are high-level MapReduce abstractions. Our final foray into MapReduce abstractions is Crunch, which is a Java library that makes it easy to write and execute MapReduce jobs. Much like Pig, it’s a pipeline-based framework but, because it’s a Java library, it offers a higher level of flexibility than you get with Pig.

Crunch is compelling in that it allows you to model MapReduce pipelines in Java without having to use MapReduce constructs such as Map/Reduce functions or Writables. Crunch also benefits from not forcing its own type ...

Get Hadoop in Practice now with the O’Reilly learning platform.

O’Reilly members experience live online training, plus books, videos, and digital content from nearly 200 publishers.