Chapter 2. Installing and Running Pig
Downloading and Installing Pig
Before you can run Pig on your machine or your Hadoop cluster, you will need to download and install it. If someone else has taken care of this, you can skip ahead to Running Pig.
You can download Pig as a complete package or as source code that you build. You can also get it as part of a Hadoop distribution.
Downloading the Pig Package from Apache
This is the official version of Apache Pig. It comes packaged with all of the JAR files needed to run Pig. It can be downloaded by going to Pig’s release page.
Pig does not need to be installed on your Hadoop cluster. It runs on the machine from which you launch Hadoop jobs. Though you can run Pig from your laptop or desktop, in practice, most cluster owners set up one or more machines that have access to their Hadoop cluster but are not part of the cluster (that is, they are not data nodes or task nodes). This makes it easier for administrators to update Pig and associated tools, as well as to secure access to the clusters. These machines are called gateway machines or edge machines. In this book I use the term gateway machine.
You will need to install Pig on these gateway machines. If your Hadoop cluster is accessible from your desktop or laptop, you can install Pig there as well. Also, you can install Pig on your local machine if you plan to use Pig in local mode.
The core of Pig is written in Java and is thus portable across operating systems. The shell script that starts ...