Skip to Content
Programming Pig, 2nd Edition
book

Programming Pig, 2nd Edition

by Alan Gates, Daniel Dai
November 2016
Intermediate to advanced
368 pages
9h 59m
English
O'Reilly Media, Inc.
Content preview from Programming Pig, 2nd Edition

Chapter 4. Introduction to Pig Latin

It is time to dig into Pig Latin. This chapter provides you with the basics of Pig Latin, enough to write your first useful scripts. More advanced features of Pig Latin are covered in Chapter 5.

Preliminary Matters

Pig Latin is a data flow language. Each processing step results in a new dataset, or relation. In input = load 'data', input is the name of the relation that results from loading the dataset data. A relation name is referred to as an alias. Relation names look like variables, but they are not. Once made, an assignment is permanent. It is possible to reuse relation names; for example, this is legitimate:

A = load 'NYSE_dividends' (exchange, symbol, date, dividends);
A = filter A by dividends > 0;
A = foreach A generate UPPER(symbol);

However, it is not recommended. It looks here as if you are reassigning A, but really you are creating new relations called A, and losing track of the old relations called A. Pig is smart enough to keep up, but it still is not a good practice. It leads to confusion when trying to read your programs (which A am I referring to?) and when reading error messages.

In addition to relation names, Pig Latin also has field names. They name a field (or column) in a relation. In the previous snippet of Pig Latin, dividends and symbol are examples of field names. These are somewhat like variables in that they will contain a different value for each record as it passes through the pipeline, but you cannot assign values to ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

Programming Pig

Programming Pig

Alan Gates
Pig Design Patterns

Pig Design Patterns

Pradeep Pasupuleti
Programming Elastic MapReduce

Programming Elastic MapReduce

Kevin Schmidt, Christopher Phillips

Publisher Resources

ISBN: 9781491937082Errata Page