Chapter 5. Data Engineering
In Chapter 4, we looked into how we can use the tools and mechanisms in Data Factory to load data into Fabric; in this chapter, we’ll focus on the Data Engineering experience.
Data engineering involves creating the technical infrastructure required to capture, store, and process significant volumes of data. This field includes designing pipelines to extract data from multiple sources, transforming it to ensure high quality and uniformity, and storing it in databases or storage solutions where it can be analyzed. Data engineers use a variety of technologies to keep these systems reliable, efficient, and scalable. Their work ensures that data is readily available and usable, forming the backbone of data analytics and supporting informed, data-driven decisions within businesses.
A real-world example of data engineering in action can be seen in an ecommerce company that processes millions of transactions daily. Using Microsoft Fabric, data engineers design pipelines that extract raw sales data from various sources, such as web logs, customer databases, and third-party payment processors. Spark jobs running in notebooks clean and aggregate this data—removing duplicates, handling missing values, and standardizing formats—before storing it in a lakehouse for further analysis. Orchestration tools ensure that these processes run seamlessly at scheduled intervals, enabling real-time inventory updates and dynamic pricing strategies. This end-to-end workflow allows ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access