Chapter 3. Data Integration, Quality, and Enrichment

In the preceding chapter, we understood the details of obtaining huge volumes of data into the Data Lake's Intake Tier from various External Data Sources. We learned various Hadoop-oriented data transfer mechanisms to either; pull the data from sources or push the data in near real-time, and to perform historical or incremental loads. We also saw the key functionalities that are implemented as part of the Data Intake Tier and got architectural guidance on the Big Data tools and technologies.

Now that the data has been acquired into the Data Lake, we will explore the next logical steps that are performed on the data in this chapter. In a nutshell, we will take a closer look at the Management Tier ...

Get Data Lake Development with Big Data now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Data Lake Development with Big Data by Pradeep Pasupuleti, Beulah Salome Purra

Chapter 3. Data Integration, Quality, and Enrichment

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly