Skip to Content
RAG with Python Cookbook
book

RAG with Python Cookbook

by Dominik Polzer
May 2026
Intermediate to advanced
378 pages
8h 17m
English
O'Reilly Media, Inc.
Content preview from RAG with Python Cookbook

Chapter 2. Foundation Models

Foundation models—including LLMs and multimodal models—form the backbone of modern RAG systems. These models are used both to generate answers for users and to prepare content before it’s stored and retrieved.

In the generation step, foundation models analyze retrieved context and user questions to produce grounded responses. In the preparation step, foundation models extract text from images, transcribe audio, summarize long documents, and enrich content with metadata that improves retrieval quality.

This chapter focuses on the language models and multimodal models used in both the preparation step (also called the ingestion phase)—where content is processed, transformed, and prepared for storage—and the generation step—where models analyze retrieved information and generate answers for users.

Figure 2-1 shows a typical multimodal workflow for processing video content:

  1. Use a vision model to analyze video frames.

  2. Use speech-to-text to transcribe audio.

  3. Embed the resulting text by using an embedding model.

  4. Retrieve relevant context when users ask questions.

  5. Generate answers with a language model.

Diagram illustrating a process involving vision models, speech-to-text, embeddings, and language models to analyze race footage and answer a question about a specific event during the race.
Figure 2-1. Multimodal models can interpret and generate text, images, audio, and video

Every RAG system needs a generation model that interprets the retrieved content and generates the required output—whether that’s answering a user question, ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

Python Polars: The Definitive Guide

Python Polars: The Definitive Guide

Jeroen Janssens, Thijs Nieuwdorp

Publisher Resources

ISBN: 9798341600553Errata Page