Skip to main content

Get full access to Programming Pig and 60K+ other titles, with a free 10-day trial of O'Reilly.

There are also live events, courses curated by job role, and more.

Start your free trial

Programming Pig

Programming Pig

by Alan Gates

Released October 2011

Publisher(s): O'Reilly Media, Inc.

ISBN: 9781449302641

Buy on Amazon

Start your free trial

Book description

This guide is an ideal learning tool and reference for Apache Pig, the open source engine for executing parallel data flows on Hadoop. With Pig, you can batch-process data without having to create a full-fledged application—making it easy for you to experiment with new datasets.

Programming Pig introduces new users to Pig, and provides experienced users with comprehensive coverage on key features such as the Pig Latin scripting language, the Grunt shell, and User Defined Functions (UDFs) for extending Pig. If you need to analyze terabytes of data, this book shows you how to do it efficiently with Pig.

Delve into Pig’s data model, including scalar and complex data types
Write Pig Latin scripts to sort, group, join, project, and filter your data
Use Grunt to work with the Hadoop Distributed File System (HDFS)
Build complex data processing pipelines with Pig’s macros and modularity features
Embed Pig Latin in Python for iterative processing and other advanced tasks
Create your own load and store functions to handle data formats and storage mechanisms
Get performance tips for running scripts on Hadoop clusters in less time

Publisher resources

View/Submit Errata

Table of contents

Product information

Title: Programming Pig
Author(s): Alan Gates
Release date: October 2011
Publisher(s): O'Reilly Media, Inc.
ISBN: 9781449302641

You might also like

book

Programming Pig, 2nd Edition

by Alan Gates, Daniel Dai

For many organizations, Hadoop is the first step for dealing with massive amounts of data. The …

book

Structured Parallel Programming

by Michael McCool, James Reinders, Arch Robison

Structured Parallel Programming offers the simplest way for developers to learn patterns for high-performance parallel programming. …

book

Pig Design Patterns

by Pradeep Pasupuleti

Simplify Hadoop programming to create complex end-to-end Enterprise Big Data solutions with Pig In Detail Pig …

video

Learning Apache Pig

by Tom Hanlon

In this Learning Apache Pig training course, expert author Tom Hanlon will teach you how to …

Don’t leave empty-handed

Get Mark Richards’s Software Architecture Patterns ebook to better understand how to design components—and how they should interact.

It’s yours, free.

Get it now

Cover of Software Architecture Patterns

Check it out now on O’Reilly

Dive in for free with a 10-day trial of the O’Reilly learning platform—then explore all the other resources our members count on to build skills and solve problems every day.

Start your free trial Become a member now