Chapter 1. Data Ingestion
Data ingestion, in essence, involves transferring data from a source to a designated target. Its primary aim is to usher data into an environment primed for staging, processing, analysis, and artificial intelligence/machine learning (AI/ML). While massive organizations may focus on moving data internally (among teams), for most of us, data ingestion emphasizes pulling data from external sources and directing it to in-house targets.
In an era where data holds central importance in both business and product development, the significance of accurate and timely data cannot be overstated. This heightened reliance on data has given rise to a multitude of “sources” from which teams extract information to refine decision-making processes, craft outstanding products, and conduct a multitude of other actions. For instance, a marketing team would need to retrieve data from several advertising and analytics platforms, such as Meta, Google (including Ads and Analytics), Snapchat, LinkedIn, and Mailchimp.
However, as time marches on, APIs and data sources undergo modifications. Columns might be introduced or removed, fields could get renamed, and new versions might replace outdated ones. Handling changes from a single source might be feasible, but what about juggling alterations from multiple sources—five, ten, or even a hundred? The pressing challenge is this: “How can a budding data team efficiently handle these diverse sources in a consistent and expandable way?” ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access