Chapter 5. Getting Started
The traditional data platforms follow a standard paradigm—it takes data feeds from multiple sources, loads it into the staging area, transforms it, and loads it into the final results data warehouse for business intelligence tools.
In this chapter, I will explain how a big data platform using Hadoop can be developed by using a similar paradigm.
I will cover the full data life cycle of a project on a risk and regulatory big data platform:
- Data collection—data ingestion from multiple sources scheduled using Oozie or Informatica
- Data transformation—transform data using Hive, Pig, and Java MapReduce
- Data analysis—integration of BI tools with Hadoop
This chapter will again be a bit more technical with architecture, data flow diagrams, ...