Chapter 12. Crunch and other technologies

 

This chapter covers
  • An exploration of Crunch basics
  • Using Crunch for data analysis
  • A comparison of Crunch and Cascading

 

Up until now we’ve looked at Pig and Hive, which are high-level MapReduce abstractions. Our final foray into MapReduce abstractions is Crunch, which is a Java library that makes it easy to write and execute MapReduce jobs. Much like Pig, it’s a pipeline-based framework but, because it’s a Java library, it offers a higher level of flexibility than you get with Pig.

Crunch is compelling in that it allows you to model MapReduce pipelines in Java without having to use MapReduce constructs such as Map/Reduce functions or Writables. Crunch also benefits from not forcing its own type ...

Get Hadoop in Practice now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.