Other BigData Tools andTechnologies | 175
Find the total transactions on date 08-20-2013
c = group b by date;
d = foreach c generate group as date,COUNT(b.date) as count;
e = ﬁlter d by date == ‘08-20-2013’;
7.4 SQOOP AND FLUME
If an organization has been around for a signicant length of time, then it is likely that all its data
lives in multiple systems other than Hadoop. One might want to move this data into Hadoop or
move data out of Hadoop into other systems in the data and analytics landscape for processing.
M07 Big Data Simplified XXXX 01.indd 175 5/17/2019 2:50:11 PM
176 | Big Data Simplied
Flume and Sqoop are the technologies which allows transfer of data to and from Hadoop. Sqoop
is an ETL component of Hadoop ecosystem.
Sqoop is an acronym for SQL to Hadoop. It is really an import-export framework for moving
data between Hadoop and relational databases, typically data warehouse systems. In fact, there
are Sqoop connectors for majority of the data warehouse platforms at this moment.
The illustration as provided in Figure 7.4 gives a snapshot of how data transfer happens
between Hadoop and the relational databases. Let us now discuss the Sqoop operations. MySQL
has been considered as the relational database. At first, it is required to log in to the MySQL.
By using the command ‘show databases;’, we can nd the databases available in RDBMS.
Figure 7.4 Data transfer between Hadoop and relational databases
HDFS ﬁle system/
T ------ Transformation
E ------ Extract L ------ Load data
M07 Big Data Simplified XXXX 01.indd 176 5/17/2019 2:50:11 PM