Skip to Content
Programming Pig
book

Programming Pig

by Alan Gates
October 2011
Intermediate to advanced content levelIntermediate to advanced
220 pages
6h 25m
English
O'Reilly Media, Inc.
Content preview from Programming Pig

Chapter 6. Advanced Pig Latin

In the previous chapter we worked through the basics of Pig Latin. In this chapter we will plumb its depths, and we will also discuss how Pig handles more complex data flows. Finally, we will look at how to use macros and modules to modularize your scripts.

Advanced Relational Operations

We will now discuss the more advanced Pig Latin operators, as well as additional options for operators that were introduced in the previous chapter.

Advanced Features of foreach

In our introduction to foreach (see foreach), we discussed how it could take a list of expressions to output for every record in your data pipeline. Now we will look at ways it can explode the number of records in your pipeline, and also how it can be used to apply a set of operations to each record.

flatten

Sometimes you have data in a bag or a tuple and you want to remove that level of nesting. The baseball data available on GitHub (see Code Examples in This Book) can be used as an example. Because a player can play more than one position, position is stored in a bag. This allows us to still have one entry per player in the baseball file.[15] But when you want to switch around your data on the fly and group by a particular position, you need a way to pull those entries out of the bag. To do this, Pig provides the flatten modifier in foreach:

--flatten.pig players = load 'baseball' as (name:chararray, team:chararray, position:bag{t:(p:chararray)}, bat:map[]); pos = foreach players generate name, flatten(position) ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Programming Pig, 2nd Edition

Programming Pig, 2nd Edition

Alan Gates, Daniel Dai
Pig Design Patterns

Pig Design Patterns

Pradeep Pasupuleti
Apache Hadoop™ YARN: Moving beyond MapReduce and Batch Processing with Apache Hadoop™ 2

Apache Hadoop™ YARN: Moving beyond MapReduce and Batch Processing with Apache Hadoop™ 2

Arun C. Murthy, Vinod Kumar Vavilapalli, Doug Eadline, Joseph Niemiec, Jeff Markham

Publisher Resources

ISBN: 9781449317881Errata Page