Chapter 11. Programming pipelines with Pig

 

This chapter covers
  • Customizing data loading in Pig
  • Data analysis with log data
  • Storing data in a compact format with SequenceFiles
  • Effective workflow and performance techniques

 

Pig is a platform that offers a high-level language with rich data analysis capabilities, making it easy to harness the power of MapReduce in a simplified manner.

Pig started life in Yahoo! as a research project to aid in working rapidly with MapReduce for prototyping purposes, and a year later was externalized into an Apache project. It uses its own language called PigLatin to model and operate on data. It’s extensible with its user-defined functions (UDFs), which allow users to bump down to Java when needed for fine-grained ...

Get Hadoop in Practice now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.