Chapter 4. Ingestion and Transformation for Multimodal AI Systems
Our industry loves bold claims: “ETL is obsolete,” “ELT is universal,” “streaming is mandatory.” Here is another one. No matter how advanced your models and AI agents are, they are constrained by the architecture that delivers their data.
In practice, ingestion is a set of trade-offs. Those trade-offs become more complex in multimodal systems, where text, images, telemetry, and events arrive asynchronously. Decisions about batch versus streaming, early versus late transformation, and validation boundaries directly shape latency, cost, reliability, and model performance.
This chapter distills the core design patterns behind multimodal ingestion. We compare batch and streaming pipelines, examine transformation strategies across modalities, and revisit ETL versus ELT through the lens of AI workloads. Using a fictional cloud kitchen company, Byte Eats, as a running example, we trace how real-time sensor data, customer reviews, and ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access