Engineering the solution

We will engineer the solution by breaking down the problem into several parts. In each part, we will perform a step to import or transform the data. Finally, we will bring everything together to create the view. To engineer the solution, we will use Sqoop to load customer master data from MySql RDBMS into Hive. We will use HDFS copy commands to load the Apache Access logs and tweets in Hadoop.

In the 360-degree view of the customer, we will combine the information from the following sources:

  • Full name, gender, userID, and e-mail from customer master data as the data from the system of records
  • Brand names frequently visited on Cosmetica's web shop as the data from web logs
  • Tweets on certain topics as the social media data

Get Hadoop Blueprints now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.