Chapter 11. Programming pipelines with Pig
- Customizing data loading in Pig
- Data analysis with log data
- Storing data in a compact format with SequenceFiles
- Effective workflow and performance techniques
Pig is a platform that offers a high-level language with rich data analysis capabilities, making it easy to harness the power of MapReduce in a simplified manner.
Pig started life in Yahoo! as a research project to aid in working rapidly with MapReduce for prototyping purposes, and a year later was externalized into an Apache project. It uses its own language called PigLatin to model and operate on data. It’s extensible with its user-defined functions (UDFs), which allow users to bump down to Java when needed for fine-grained ...