Other BigData Tools andTechnologies | 175
Find the total transactions on date 08-20-2013
c = group b by date;
d = foreach c generate group as date,COUNT(b.date) as count;
e = filter d by date == ‘08-20-2013’;
DUMP e;
7.4 SQOOP AND FLUME
If an organization has been around for a signicant length of time, then it is likely that all its data
lives in multiple systems other than Hadoop. One might want to move this data into Hadoop or
move data out of Hadoop into other systems in the data and analytics landscape for processing.
M07 Big Data Simplified XXXX 01.indd 175 5/17/2019 2:50:11 PM
176 | Big Data Simplied
Flume and Sqoop are the technologies which allows transfer of data to and from Hadoop. Sqoop
is an ETL component of Hadoop ecosystem.
Sqoop is an acronym for SQL to Hadoop. It is really an import-export framework for moving
data between Hadoop and relational databases, typically data warehouse systems. In fact, there
are Sqoop connectors for majority of the data warehouse platforms at this moment.
The illustration as provided in Figure 7.4 gives a snapshot of how data transfer happens
between Hadoop and the relational databases. Let us now discuss the Sqoop operations. MySQL
has been considered as the relational database. At first, it is required to log in to the MySQL.
By using the command ‘show databases;’, we can nd the databases available in RDBMS.
Figure 7.4 Data transfer between Hadoop and relational databases
HDFS file system/
Hive datawarehouse
Export data
T ------ Transformation
E ------ Extract L ------ Load data
RDBMS (MySQL/Oracle/
Teradata, etc.)
Import data
M07 Big Data Simplified XXXX 01.indd 176 5/17/2019 2:50:11 PM

Get Big Data Simplified now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.