Chapter 2. Data Ingestion Design Patterns
Data engineering systems always start by acquiring the data to work on. They’re rarely the data generators themselves and most often they need to interact with various data producers. Working with these producers is not easy as they can be completely different organizations, different teams within your company, or even different pipelines inside your team. Each of them will have dedicated constraints inherited from the technical and business environments that will make the interaction challenging.
But you have no choice. You have to adapt. Otherwise, you won’t get any data and as a result, won’t feed your data analytics or data science workloads. Or even worse, you will get some data, share it with your downstream consumers, but a few days later, you’ll get some complaints in return. The complaint may be about an incomplete dataset, an inefficient data organization, or a completely broken data requiring internal restore process and backfilling.
As you ...
Get Data Engineering Design Patterns now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.