Chapter 5. Feature Engineering and Management for Multimodal Data
In the previous chapter, we settled the debate of ETL versus ELT among other concepts. You can now go from “there’s a lot of multimodal data out there” to “here’s how we can bring it in and store it.” Our goal for this chapter is for you to be able to go from saying “I have text and images in my pipeline” to saying “I know how to extract features from each modality, fuse them, store them in a feature store, and serve them to downstream models.”
Here’s how this chapter will help you get there.
We begin by mastering the core techniques for engineering features across modalities: how to structure raw inputs, extract high-fidelity embeddings, and align them across time and space. You’ll then learn how to select and fuse features using five modern fusion architectures, and how to implement these through pipeline-aware infrastructure. Finally, we’ll move beyond feature engineering into storage, governance, and reusability. You’ll explore ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access