book

Data Engineering for Multimodal AI

Name: Data Engineering for Multimodal AI
Author: Vasundra Srinivasan
ISBN: 9781098190781

by Vasundra Srinivasan

August 2026

Intermediate to advanced

450 pages

5h 15m

English

O'Reilly Media, Inc.

Read now

Unlock full access

1. The Multimodal AI Data Engineering Landscape
Engineering Context for Multimodal AI SystemsData Architecture Components in a Multimodal AI SystemData Contracts: The Foundation of TrustBuilding the Semantic Layer for Multimodal AI SystemsFrom Principle to Practice: Granite’s Layered ArchitectureEntity Resolution And Why It’s HardThe Serving Layer and Feedback LoopA Note on Multimodal Foundation ModelsBuilding the Semantic Layer: A Step-by-Step SummaryKey Challenges in Building Multimodal AI PipelinesThe Multimodal AI Data LifecycleThe Stages of the Multimodal Data LifecycleStage 1: Data IngestionStage 2: Staging & OrganizationStage 3: Core ProcessingStage 4: Feature EngineeringStage 5: Model ServingStages 6–8: Feedback & Continuous LearningStage 9: Monitoring & AnalysisPlaybook: Building a Minimal Semantic Layer PipelineStep 1: Data ExplorationStep 2: Initial Setup and Data PreparationStep 3: Data Ingestion PipelineStep 4: Start the Semantic API Server
2. Data Architectures for Multimodal AI Systems
2.1 Core Patterns for Multimodal Data ArchitecturesComplementary Strategies: Hybrid Storage and Real-Time Processing2.2 Embeddings and Retrieval in Multimodal Data Architectures2.3 Containerization and Orchestration for Multimodal AI2.4 Reliability and Scale in Multimodal AI Data ArchitectureStreaming IngestionScaling Strategies2.4 Data Fabric and Mesh Architectures for Multimodal SystemsData Fabric FundamentalsData Mesh FundamentalsPlaybook: Building a Multimodal Feature Store for StreamBuyStep 0 Data ExplorationStep 1 Project Set up and ArchitectureStep 2: Data ModelsStep 3: Feature EncodersStep 4: Database ClientsStep 5 Background VectorStep 6 API LayerStep 7 Running TestsStep 8 Running the Full StackExtending this Pipeline
3. Platforms and Patterns for Multimodal AI Data Engineering
3.1 Setting the Context: Multimodal Context IntelligenceContext Intelligence in Action: Revault’s Onboarding Flow3.2 Platform Foundations for Multimodal Context InfrastructureArchetype 1: Managed Multimodal AI PlatformsArchetype 2: Composable Multimodal ContextArchetype 3: Real-Time Streaming Infrastructure for Multimodal ContextChoosing Between Archetypes: A Cost and Complexity Framework3.3 Core System Patterns for Multimodal Context EngineeringPattern 1: Multimodal Representation LayerPattern 2: Temporal State AssemblyPattern 3: Selective Context RetrievalCross-Cutting Concerns: Observability, Embeddings, and Agentic Orchestration3.4 Playbook: Building a Serverless Document Retrieval PipelineArchitecture OverviewScaling for ProductionPlaybook ConclusionConclusion
4. Ingestion and Transformation for Multimodal AI Systems
Why is Multimodal Ingestion Fundamentally Harder?4.1 Multimodal Ingestion PipelinesTraditional ApproachA Hybrid AI ApproachThe Agentic Multimodal Pipeline Approach4.2 Operationalizing ELT for Multimodal AIDecision 1: Where and How to Ingest Raw DataDecision 2: When to Apply AI-Driven TransformationsDecision 3: Where to Join and Fuse ModalitiesDecision 4: What Triggers Actions in the SystemDecision 5: Designing for Continuous FeedbackDecision 6: Governing Schema Evolution Across ModalitiesBalancing Cost, Latency, and Fidelity4.3 Data Integrity and Effective Transformation Across ModalitiesWhat Happens When Validation FailsFrom Validation to RepresentationObservability: Tracing Signals Through the PipelineAnnotation: Where Multimodal Systems Mature or CollapsePlaybook: Building A Core Multimodal AI Data Operations PipelineStep 1: Order IngestionStep 2: Robot Prep TriggerStep 3: Feature Extraction from Videos to Completion TriggerStep 4: QA TestingStep 5: Vector EmbeddingsStep 6: Similarity SearchBonus: Scaling Similarity Search to ProductionImproving Matching QualityConclusion
5. Feature Engineering and Management for Multimodal Data
5.1 Feature engineering techniques for multimodal dataA Modality-Aware Feature Engineering PipelinePreprocessing as Feature EngineeringExtracting Features Across ModalitiesFeature Engineering Core Techniques5.2 Feature selection and fusion in multimodal contextsWhy Fusion Is Not a JoinEncoder-Decoder FusionAttention-Based FusionGraph-based fusion methodsGenerative fusion modelsConstraint-Based FusionSummarizing FusionUnified Embeddings5.3 Implementing and Managing Cloud-Based Multimodal Feature StoresWhy Feature Stores Are NeededFour Patterns for Feature Store ArchitectureThe Dual-Layer PatternThe Vector-Augmented Hybrid PatternThe Metadata-governed feature meshFeature Store as a ServiceFeature Store Implementation: BayGo’s Two ApproachesPlaybook: Building an End-to-End Multimodal Feature Engineering PipelineFamiliarizing with the DatasetPrerequisitesProject OverviewNotesTakeaways
About the Author

Content preview from Data Engineering for Multimodal AI

Chapter 5. Feature Engineering and Management for Multimodal Data

In the previous chapter, we settled the debate of ETL versus ELT among other concepts. You can now go from “there’s a lot of multimodal data out there” to “here’s how we can bring it in and store it.” Our goal for this chapter is for you to be able to go from saying “I have text and images in my pipeline” to saying “I know how to extract features from each modality, fuse them, store them in a feature store, and serve them to downstream models.”

Here’s how this chapter will help you get there.

We begin by mastering the core techniques for engineering features across modalities: how to structure raw inputs, extract high-fidelity embeddings, and align them across time and space. You’ll then learn how to select and fuse features using five modern fusion architectures, and how to implement these through pipeline-aware infrastructure. Finally, we’ll move beyond feature engineering into storage, governance, and reusability. You’ll explore ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

ML and Generative AI in the Data Lakehouse

Publisher Resources

ISBN: 9781098190774Errata Page

Cloud Computing