Chapter 3. Demystifying the Medallion Architecture
In Chapter 1, we explored the evolution of Spark and Delta Lake, and introduced you to the Medallion architecture. This design pattern helps organize data logically within modern lakehouse architectures. It utilizes three layers (Bronze, Silver, and Gold) to progressively refine datasets throughout the lifecycles of data ingestion, data transformation, and data loading into various destinations.
Explaining the Medallion architecture to organizations often feels like opening a can of worms, as each layer, meant to address different concerns, lacks clear definitions and descriptive guidelines. This ambiguity leads to more questions than answers, creating a cycle of confusion and inefficiency.
Despite the popularity of this three-layered design, there’s significant debate about the scope, purpose, and best practices for each layer. Moreover, the gap between theory and practical application is substantial. In this chapter, I will share insights from my practical experiences on designing each layer of the Medallion architecture by using a theoretical viewpoint. In Part II, the focus shifts from theory to practice. The insights from this chapter are carried forward when engaging in a hands-on exercise where you’ll be taught to build a real solution architecture.
The Three-Layered Design
Before discussing the specifics of each layer in the Medallion architecture, it’s crucial to understand the high-level purposes and functions of the ...