Chapter 2. Data ingestion design patterns

Data engineering systems are rarely data generators. More often their first stage is data acquisition from various data producers. Working with these producers is not easy; they can be different pipelines inside your team, different teams within your company, or even completely different organizations. Because each producer has dedicated constraints inherited from the technical and business environments, the interaction with them might be challenging for you.

But you have no choice. You have to adapt. Otherwise, you won’t get any data and as a result, you won’t feed your data analytics or data science workloads. Or even worse, you will get some data, share it with your downstream consumers, but a few days later, you’ll get some complaints in return. The complaint may be about an incomplete dataset, an inefficient data organization, or a completely broken data requiring internal restore process and backfilling.

As you can already see, bringing data to ...

Get Data Engineering Design Patterns now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.