O'Reilly logo

Learning Hadoop 2 by Garry Turkington, Gabriele Modena

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Summary

In this chapter, we introduced Apache Pig, a platform for large-scale data analysis on Hadoop. In particular, we covered the following topics:

  • The goals of Pig as a way of providing a dataflow-like abstraction that does not require hands-on MapReduce development
  • How Pig's approach to processing data compares to SQL, where Pig is procedural while SQL is declarative
  • Getting started with Pig — an easy task, as it is a library that generates custom code and doesn't require additional services
  • An overview of the data types, core functions, and extension mechanisms provided by Pig
  • Examples of applying Pig to analyze the Twitter dataset in detail, which demonstrated its ability to express complex concepts in a very concise fashion
  • How libraries such ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required