book

RAG with Python Cookbook

Name: RAG with Python Cookbook
Author: Dominik Polzer
ISBN: 9798341600560

by Dominik Polzer

May 2026

Intermediate to advanced

378 pages

8h 17m

English

O'Reilly Media, Inc.

Read now

Unlock full access

Preface
Who This Book Is ForWhat You’ll Learn and How the Book Is OrganizedConventions Used in This BookUsing Code ExamplesO’Reilly Online LearningHow to Contact UsAcknowledgments
1. Getting Started with RAG
1.1. Identifying High-Value RAG Use Cases for Your Organization1.2. Choosing Your IDE and Coding Agent Setup1.3. Getting Started with Jupyter Notebooks in VS Code1.4. Storing Secrets and API Keys with .env Files1.5. Building Your First RAG App1.6. Choosing the Frameworks and Libraries for Your RAG Applications1.7. Running the Code Examples in the Book Repository
2. Foundation Models
2.1. Defining a Suitable Prompt Template2.2. Selecting the Right Language Model for Your Task2.3. Generating Content with the OpenAI API2.4. Generating Content with Google’s Gemini Models2.5. Generating Content with the Anthropic API2.6. Running Open Source Models Locally with Ollama2.7. Creating Structured Outputs with the OpenAI SDK and Pydantic
3. Loading Data
3.1. Loading Word Files in Python3.2. Loading PDF Files3.3. Loading and Handling Tabular Data from Excel and CSV Files3.4. Loading Structured Data from a PostgreSQL Database3.5. Loading Audio Files via Speech-to-Text Models3.6. Extracting Text from Images and PDFs via Tesseract OCR3.7. Extracting Text from Images via Multimodal Models3.8. Generating Text Description for Images via Multimodal Models3.9. Generating Text Summaries for Embedded Tables via Multimodal Models3.10. Parsing PDFs with Multimodal Content3.11. Loading Videos via Speech-to-Text and Multimodal Models
4. Data Preparation
4.1. Adding Metadata to Enable Metadata Filtering4.2. Enhancing Data Quality by Replacing Abbreviations and Technical Terms4.3. Improving Search Accuracy by Creating Hypothetical Questions for Text Chunks4.4. Splitting Documents via Character Splitting4.5. Splitting Documents with Recursive Text Splitters4.6. Chunking Documents with Document-Aware Splitting4.7. Splitting Text with Semantic-Aware Chunkers4.8. Splitting Text with Agentic Chunkers
5. Embeddings
5.1. Mapping the Linguistic Meaning of Text Chunks to a Numerical Representation5.2. Visualizing Semantic Relationships Between Text Chunks via Dimensionality Reduction Techniques5.3. Calculating the Distance Between Embeddings5.4. Choosing the Right Embedding Model5.5. Generating Embeddings for Images and Text with CLIP5.6. Performing Text Classification with Embeddings5.7. Improving Search Results with a Hybrid Search Approach
6. Vector Databases and Similarity Searches
6.1. Choosing the Right Vector Database6.2. Storing and Searching Embeddings with FAISS6.3. Storing and Working with Embeddings in a Chroma Vector Database6.4. Storing Embeddings in PostgreSQL with the pgvector Extension6.5. Performing Similarity Search in PostgreSQL6.6. Accelerating Vector Searches in PostgreSQL with Indexing Techniques6.7. Combining Keyword and Similarity Search to Improve Retrieval Accuracy with PostgreSQL
7. Retrieval
7.1. Optimizing Query Results via Metadata Filtering in PostgreSQL7.2. Enhancing Retrieval Accuracy with HyDE7.3. Improving Search Results with Multiquery Retrieval7.4. Addressing Complex Requests by Designing a Query Routing System7.5. Enhancing Retrieved Documents by Designing an Auto-Merging Retriever7.6. Retrieving More Complete Text Chunks with a Sentence Window Retriever7.7. Improving Retrieval Relevancy with Reranking Methods7.8. Decomposing Complex Queries into Multiple Subqueries
8. Agentic RAG
8.1. Designing a Custom Tool in Python8.2. Using Workflow Patterns in Multiagent Systems8.3. Choosing an Agentic Framework8.4. Building an Agentic System via Function Calling8.5. Accelerating Agents with asyncio8.6. Building a Sales Negotiation Agent with OpenAI’s Agents SDK and Chroma8.7. Enriching Your Agent’s Capabilities with MCP Tools8.8. Building an Agentic System with LangGraph
9. Graph RAG
9.1. Creating Your First Neo4j Knowledge Graph and Feeding It with Text from Documents9.2. Extending the Knowledge Graph with Structured Data9.3. Building Your First Cypher Query9.4. Enabling Semantic Search on a Neo4j Knowledge Graph9.5. Optimizing the Knowledge Graph for RAG Systems

10. Evaluating RAG Systems
10.1. Choosing the Right Evaluation Metrics for RAG Systems10.2. Evaluating RAG Systems by Humans10.3. Creating Synthetic Data for Automated Testing10.4. Evaluating the Retriever Step by Calculating Context Precision@k10.5. Evaluating Faithfulness During Generation with LLM-as-a-Judge10.6. Evaluating the Response Relevancy of Your RAG System
11. RAG Web Apps
11.1. Building Your First Streamlit App11.2. Building a Chatbot App with Streamlit11.3. Adding PDF Analyzer Functionality to Your Chatbot11.4. Connecting Your RAG App to a SQL Database11.5. Deploying Your Streamlit App with Docker and AWS
Index
About the Author

Content preview from RAG with Python Cookbook

Chapter 2. Foundation Models

Foundation models—including LLMs and multimodal models—form the backbone of modern RAG systems. These models are used both to generate answers for users and to prepare content before it’s stored and retrieved.

In the generation step, foundation models analyze retrieved context and user questions to produce grounded responses. In the preparation step, foundation models extract text from images, transcribe audio, summarize long documents, and enrich content with metadata that improves retrieval quality.

This chapter focuses on the language models and multimodal models used in both the preparation step (also called the ingestion phase)—where content is processed, transformed, and prepared for storage—and the generation step—where models analyze retrieved information and generate answers for users.

Figure 2-1 shows a typical multimodal workflow for processing video content:

Use a vision model to analyze video frames.
Use speech-to-text to transcribe audio.
Embed the resulting text by using an embedding model.
Retrieve relevant context when users ask questions.
Generate answers with a language model.

Diagram illustrating a process involving vision models, speech-to-text, embeddings, and language models to analyze race footage and answer a question about a specific event during the race.

Every RAG system needs a generation model that interprets the retrieved content and generates the required output—whether that’s answering a user question, ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 9798341600553Errata Page

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

RAG with Python Cookbook

by Dominik Polzer

Chapter 2. Foundation Models

Figure 2-1. Multimodal models can interpret and generate text, images, audio, and video

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.