Chapter 11. Programming pipelines with Pig


This chapter covers
  • Customizing data loading in Pig
  • Data analysis with log data
  • Storing data in a compact format with SequenceFiles
  • Effective workflow and performance techniques


Pig is a platform that offers a high-level language with rich data analysis capabilities, making it easy to harness the power of MapReduce in a simplified manner.

Pig started life in Yahoo! as a research project to aid in working rapidly with MapReduce for prototyping purposes, and a year later was externalized into an Apache project. It uses its own language called PigLatin to model and operate on data. It’s extensible with its user-defined functions (UDFs), which allow users to bump down to Java when needed for fine-grained ...

