Extracting the top Twitter users

As you may have understood, all jobs will have the same structure; I will describe in great detail this job and then give a big picture for the other parts.

At the end, your job should look as follows:

Extracting the top Twitter users

The top Twitter Pig job

The basic steps will be to load the data, filter columns, use aggregate functions, sort data, and store the resulting data.

To create the CH05_01_PIG_TOP_TWITTERS job under a new chapter5 directory, we will need the following components:

  • A tPigLoad component to load the data from HDFS.

    Actually, this component can load data from another type of storage; you can even extend your own loader or storer ...

Get Talend for Big Data now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.