book

RAG with Python Cookbook

Name: RAG with Python Cookbook
Author: Dominik Polzer
ISBN: 9798341600560

by Dominik Polzer

May 2026

Intermediate to advanced

378 pages

8h 17m

English

O'Reilly Media, Inc.

Read now

Unlock full access

Preface
Who This Book Is ForWhat You’ll Learn and How the Book Is OrganizedConventions Used in This BookUsing Code ExamplesO’Reilly Online LearningHow to Contact UsAcknowledgments
1. Getting Started with RAG
1.1. Identifying High-Value RAG Use Cases for Your Organization1.2. Choosing Your IDE and Coding Agent Setup1.3. Getting Started with Jupyter Notebooks in VS Code1.4. Storing Secrets and API Keys with .env Files1.5. Building Your First RAG App1.6. Choosing the Frameworks and Libraries for Your RAG Applications1.7. Running the Code Examples in the Book Repository
2. Foundation Models
2.1. Defining a Suitable Prompt Template2.2. Selecting the Right Language Model for Your Task2.3. Generating Content with the OpenAI API2.4. Generating Content with Google’s Gemini Models2.5. Generating Content with the Anthropic API2.6. Running Open Source Models Locally with Ollama2.7. Creating Structured Outputs with the OpenAI SDK and Pydantic
3. Loading Data
3.1. Loading Word Files in Python3.2. Loading PDF Files3.3. Loading and Handling Tabular Data from Excel and CSV Files3.4. Loading Structured Data from a PostgreSQL Database3.5. Loading Audio Files via Speech-to-Text Models3.6. Extracting Text from Images and PDFs via Tesseract OCR3.7. Extracting Text from Images via Multimodal Models3.8. Generating Text Description for Images via Multimodal Models3.9. Generating Text Summaries for Embedded Tables via Multimodal Models3.10. Parsing PDFs with Multimodal Content3.11. Loading Videos via Speech-to-Text and Multimodal Models
4. Data Preparation
4.1. Adding Metadata to Enable Metadata Filtering4.2. Enhancing Data Quality by Replacing Abbreviations and Technical Terms4.3. Improving Search Accuracy by Creating Hypothetical Questions for Text Chunks4.4. Splitting Documents via Character Splitting4.5. Splitting Documents with Recursive Text Splitters4.6. Chunking Documents with Document-Aware Splitting4.7. Splitting Text with Semantic-Aware Chunkers4.8. Splitting Text with Agentic Chunkers
5. Embeddings
5.1. Mapping the Linguistic Meaning of Text Chunks to a Numerical Representation5.2. Visualizing Semantic Relationships Between Text Chunks via Dimensionality Reduction Techniques5.3. Calculating the Distance Between Embeddings5.4. Choosing the Right Embedding Model5.5. Generating Embeddings for Images and Text with CLIP5.6. Performing Text Classification with Embeddings5.7. Improving Search Results with a Hybrid Search Approach
6. Vector Databases and Similarity Searches
6.1. Choosing the Right Vector Database6.2. Storing and Searching Embeddings with FAISS6.3. Storing and Working with Embeddings in a Chroma Vector Database6.4. Storing Embeddings in PostgreSQL with the pgvector Extension6.5. Performing Similarity Search in PostgreSQL6.6. Accelerating Vector Searches in PostgreSQL with Indexing Techniques6.7. Combining Keyword and Similarity Search to Improve Retrieval Accuracy with PostgreSQL
7. Retrieval
7.1. Optimizing Query Results via Metadata Filtering in PostgreSQL7.2. Enhancing Retrieval Accuracy with HyDE7.3. Improving Search Results with Multiquery Retrieval7.4. Addressing Complex Requests by Designing a Query Routing System7.5. Enhancing Retrieved Documents by Designing an Auto-Merging Retriever7.6. Retrieving More Complete Text Chunks with a Sentence Window Retriever7.7. Improving Retrieval Relevancy with Reranking Methods7.8. Decomposing Complex Queries into Multiple Subqueries
8. Agentic RAG
8.1. Designing a Custom Tool in Python8.2. Using Workflow Patterns in Multiagent Systems8.3. Choosing an Agentic Framework8.4. Building an Agentic System via Function Calling8.5. Accelerating Agents with asyncio8.6. Building a Sales Negotiation Agent with OpenAI’s Agents SDK and Chroma8.7. Enriching Your Agent’s Capabilities with MCP Tools8.8. Building an Agentic System with LangGraph
9. Graph RAG
9.1. Creating Your First Neo4j Knowledge Graph and Feeding It with Text from Documents9.2. Extending the Knowledge Graph with Structured Data9.3. Building Your First Cypher Query9.4. Enabling Semantic Search on a Neo4j Knowledge Graph9.5. Optimizing the Knowledge Graph for RAG Systems

10. Evaluating RAG Systems
10.1. Choosing the Right Evaluation Metrics for RAG Systems10.2. Evaluating RAG Systems by Humans10.3. Creating Synthetic Data for Automated Testing10.4. Evaluating the Retriever Step by Calculating Context Precision@k10.5. Evaluating Faithfulness During Generation with LLM-as-a-Judge10.6. Evaluating the Response Relevancy of Your RAG System
11. RAG Web Apps
11.1. Building Your First Streamlit App11.2. Building a Chatbot App with Streamlit11.3. Adding PDF Analyzer Functionality to Your Chatbot11.4. Connecting Your RAG App to a SQL Database11.5. Deploying Your Streamlit App with Docker and AWS
Index
About the Author

Content preview from RAG with Python Cookbook

Chapter 10. Evaluating RAG Systems

You can’t improve what you can’t measure. As you tune retrieval parameters, adjust prompts, or switch models, you need metrics that tell you whether changes help or hurt. Without evaluation, optimization becomes guesswork.

RAG evaluation requires different approaches than traditional machine learning (ML). In traditional ML, you train models to learn patterns from labeled data, then test whether they generalize to unseen examples. RAG systems don’t learn—they retrieve and generate. Foundation models already generalize across tasks. The question isn’t whether the model learned patterns, but whether it retrieved the right information and generated useful answers.

This difference shapes how you evaluate. The key challenge is that test questions should cover realistic user intent without appearing word for word in the system’s data, while remaining answerable from the available knowledge. Good generalization in RAG means handling different phrasings of the same underlying question. Intent matters more than exact wording. This affects test design—you reformulate real queries rather than split existing labeled examples.

Figure 10-1 illustrates the problem: asking philosophical questions to a system that only knows football (soccer) rules produces meaningless evaluation results.

Diagram illustrating a RAG system's inefficiency in handling philosophical questions due to a football-focused knowledge base.

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 9798341600553Errata Page

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

RAG with Python Cookbook

by Dominik Polzer

Chapter 10. Evaluating RAG Systems

Figure 10-1. RAG systems should only be evaluated on questions that can ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.