Chapter 12. Crunch and other technologies
- An exploration of Crunch basics
- Using Crunch for data analysis
- A comparison of Crunch and Cascading
Up until now we’ve looked at Pig and Hive, which are high-level MapReduce abstractions. Our final foray into MapReduce abstractions is Crunch, which is a Java library that makes it easy to write and execute MapReduce jobs. Much like Pig, it’s a pipeline-based framework but, because it’s a Java library, it offers a higher level of flexibility than you get with Pig.
Crunch is compelling in that it allows you to model MapReduce pipelines in Java without having to use MapReduce constructs such as Map/Reduce functions or Writables. Crunch also benefits from not forcing its own type ...