Performing Group By queries in Pig
In this recipe, we will use the Group By operator in Pig scripts to get the desired output.
Getting ready
To perform this recipe, you should have a running Hadoop cluster as well as the latest version of Pig installed on it.
How to do it...
Group By is a very useful operator for data analysis. Pig supports this operator so that we can perform aggregations at the group level. Take the same data that we used in the previous recipe where we have this employee dataset:
1 Tanmay ENGINEERING 5000 2 Sneha PRODUCTION 8000 3 Sakalya ENGINEERING 7000 4 Avinash SALES 6000 5 Manisha SALES 5700 6 Vinit FINANCE 6200
First of all, load the data into HDFS:
hadoop fs -mkdir /pig/emps_data hadoop fs -put emps.txt /pig/emps_data
Next, ...
Get Hadoop: Data Processing and Modelling now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.