O'Reilly logo

Hadoop in Practice by Alex Holmes

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 12. Crunch and other technologies

 

This chapter covers
  • An exploration of Crunch basics
  • Using Crunch for data analysis
  • A comparison of Crunch and Cascading

 

Up until now we’ve looked at Pig and Hive, which are high-level MapReduce abstractions. Our final foray into MapReduce abstractions is Crunch, which is a Java library that makes it easy to write and execute MapReduce jobs. Much like Pig, it’s a pipeline-based framework but, because it’s a Java library, it offers a higher level of flexibility than you get with Pig.

Crunch is compelling in that it allows you to model MapReduce pipelines in Java without having to use MapReduce constructs such as Map/Reduce functions or Writables. Crunch also benefits from not forcing its own type ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required