Chapter 2. Installing and Running Pig

In this chapter, we show you how to download the Pig binary and start the Pig command line.

Downloading and Installing Pig

Before you can run Pig on your machine or your Hadoop cluster, you will need to download and install it. If someone else has taken care of this, you can skip ahead to “Running Pig”.

You can download Pig as a complete package or as source code that you build. You can also get it as part of a Hadoop distribution.

Downloading the Pig Package from Apache

You can download the official version of Apache Pig, which comes packaged with all of the required JAR files, from Pig’s release page.

Pig does not need to be installed on your Hadoop cluster. It runs on the machine from which you launch Hadoop jobs. Though you can run Pig from your laptop or desktop, in practice, most cluster owners set up one or more machines that have access to their Hadoop cluster but are not part of the cluster (that is, they are not DataNodes or TaskTrackers/NodeManagers). This makes it easier for administrators to update Pig and associated tools, as well as to secure access to the clusters. These machines are called gateway machines or edge machines. In this book we use the term gateway machine.

You will need to install Pig on these gateway machines. If your Hadoop cluster is accessible from your desktop or laptop, you can install Pig there as well. Also, you can install Pig on your local machine if you plan to use Pig in local mode (see “Running Pig Locally ...

Get Programming Pig, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.