Chapter 7: Data Curation Stage – The Silver Layer

The journey of data is now at a very critical stage. In this stage, the driver (data engineer) needs to carefully plan and maneuver the vehicle (data pipeline) around several roadblocks in such a way that the sanity, durability, and security of the data are preserved.

In the previous chapter, we performed a deep dive into Delta Lake. Understanding the Delta Lake functionality is a critical skill, as it enables the data engineer to design and develop the silver layer of the lakehouse. In this chapter, we will advance our understanding of how to cleanse raw data. We will start by learning the need for data curation, followed by building a data curation pipeline that can perform the cleaning work ...

Get Data Engineering with Apache Spark, Delta Lake, and Lakehouse now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.