Performing JOINS in Pig

In this recipe, we will learn how to perform various joins in Pig in order to join datasets.

Getting ready

To perform this recipe, you should have a running Hadoop cluster as well as the latest version of Pig installed on it.

How to do it...

JOIN operations are very famous in SQL. Pig Latin also supports joining datasets based on a common attribute between them. Pig supports both Inner and Outer joins. Let's understand these syntaxes one by one.

In order to learn about Joins in Pig, we'll need two datasets. The first one is the employee dataset, which we have been using in earlier recipes, the second is the ID location dataset, which contains information about the ID of an employee and their location.

The employee dataset will ...

Get Hadoop: Data Processing and Modelling now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.