Extracting the top Twitter users
As you may have understood, all jobs will have the same structure; I will describe in great detail this job and then give a big picture for the other parts.
At the end, your job should look as follows:
The basic steps will be to load the data, filter columns, use aggregate functions, sort data, and store the resulting data.
To create the
CH05_01_PIG_TOP_TWITTERS job under a new
chapter5 directory, we will need the following components:
tPigLoadcomponent to load the data from HDFS.
Actually, this component can load data from another type of storage; you can even extend your own loader or storer ...