This recipe explains how to join two datasets using Pig. We will use the BookCrossing dataset for this recipe. This recipe will use Pig to join the Books dataset with the Book-Ratings dataset and find the distribution of high ratings (with rating>3) with respect to authors.
This section describes how to use a Pig Latin script to find author's review rating distribution by joining the Books dataset with the Ratings dataset:
chapter6-bookcrossing-data.tar.gz) from the
chapter6folder of the code repository.
$ hdfs dfs –mkdir book-crossing ...