book

Context Engineering with DSPy

Name: Context Engineering with DSPy
Author: Mike Taylor
ISBN: 9798341671263

by Mike Taylor

December 2026

Intermediate to advanced

300 pages

4h 49m

English

O'Reilly Media, Inc.

Read now

Unlock full access

Brief Table of Contents (Not Yet Final)
1. Introduction to Context Engineering
Defining Context EngineeringContext Engineering vs Prompt EngineeringIntroducing DSPyFailure Modes for Long ContextsContext OverflowToken CostsContext DistractionLost in the MiddlePrompt InjectionContext ConfusionContext FragmentationContext ClashContext DriftContext PoisoningContext Engineering TechniquesContext EditingContext SummarizationMemory OffloadingPrompt OptimizationFewShot LearningRetrieval Augmented Generation (RAG)Tool LoadoutGuardrailsControl FlowMulti-Agent SystemsConclusion
2. Introduction to DSPy
Language ModelsSetting up OpenAIChanging LM ProviderSignaturesDefining a SignatureGet Over the UnfamiliarityModulesPredict ModuleBuilt-In ModulesReAct AgentsEvaluationExample DatasetsEvaluation MetricsRunning EvaluationsTrain-Test SplitOptimizersRunning an OptimizerGEPA Prompt OptimizerConclusion
3. DSPy in 8 Steps
3.1 Specify your Signatures3.1.1 HumanizeAIText3.1.2 DetectAIText3.2 Build your Modules3.2.1 TextTransformer3.2.2 AIDetector3.3 Explore a few Examples3.3.1 Run your Program3.3.2 Inspect History3.3.3 MLFlow Traces3.4 Collect your Dataset3.4.1 Example Dataset3.4.1 Load from CSV3.4.1 Convert to DSPy Example format3.4.1 Train-Test Split3.5 Define your Metrics3.5.1 Exact Match3.5.2 LLM Judge3.5.2 Testing your Metric3.6 Establish a Baseline3.6.1 Evaluate your Judge3.6.2 Evaluate your Task3.7 Optimize your Program3.7.1 Distilling from a Smarter model3.7.2 Running the GEPA Optimizer3.7.3 LLM Judge Metric3.7.4 Optimizing your Task Program3.7.5 Evaluating your Optimized Program3.8 Test and Iterate3.8.1 BestOfN ModuleConclusion
4. Strategies for Collecting Datasets
4.1 Error Analysis4.1.1 Manually Review and Annotate Data4.1.2 Categorize Errors4.1.3 Define a Dataset4.2 Domain Experts4.2.1 Qualitative Interview4.2.2 Example Correction4.2.3 Gold-Standard or Fool’s Gold?4.3 Third-Party Datasets4.3.1 Built-in DSPy Datasets4.3.2 HuggingFace Datasets4.3.3 Kaggle Datasets4.4 Synthetic Bootstrapping4.4.1 Model Distillation4.4.2 Data Enrichment4.4.3 Differential PrivacyConclusion
5. Formalizing Evaluation Metrics
5.1 String Comparison5.1.1 Passage Match5.1.2 Regex Match5.1.3 Edit Distance5.2 Semantic Similarity5.2.1 Embedder5.2.2 Embedding Distance5.2.3 NLP Classifiers5.3 LLM-as-a-Judge5.3.1 Human-in-the-Loop5.3.2 Training a classifier5.3.3 Evaluator-Optimizer pattern5.4 Panel of Judges5.4.1 Criteria Rubrics5.4.2 Weighted Scoring5.4.3 Multi-predictor FeedbackConclusion
6. Deep Dive into Prompt Optimizers
6.1 Automatic Few-Shot Learning6.1.1 LabeledFewShot6.1.2 BootstrapFewShot6.1.3 BootstrapFewShotWithRandomSearch (BootstrapRS)6.1.4 KNNFewShot6.1.5 BootstrapFewShotWithOptuna6.2 Automatic Instruction Optimization6.2.1 COPRO6.2.2 MIPROv26.2.3 GEPA6.2.4 SIMBA6.2.5 InferRules6.2.6 AvatarOptimizer6.3 Automatic Finetuning6.3.1 BootstrapFinetune6.3.2 BetterTogether6.3.3 GRPO6.4 Program Transformations6.4.1 Ensemble6.5 Choosing the Right OptimizerConclusion
7. Customizing DSPy Programs
7.1 Built-In Modules7.1.1 Predict7.1.2 ChainOfThought7.1.3 MultiChainComparison7.1.4 BestOfN7.1.5 Refine7.1.6 ReAct7.1.7 ProgramOfThought7.1.8 CodeAct7.1.9 RLM (Recursive language models)7.1.10 Building multi-stage modulesBuilt-in module summary7.2 Other Components7.2.1 majority7.1.8 Parallel7.3 Multimodal Types7.3.1 Images7.3.2 Audio7.3.3 Subclassing BaseType for videos7.3.4 Documents with the Attachments library7.4 Chat Adapters7.4.1 ChatAdapter7.4.2 JSONAdapter7.4.3 XMLAdapter7.4.4 TwoStepAdapter7.4.5 Building custom adaptersConclusion
About the Author

Content preview from Context Engineering with DSPy

Chapter 5. Formalizing Evaluation Metrics

In chapter 4 you learned strategies for collecting datasets. You saw how error analysis helps you identify common failure patterns in your program, and how to work with domain experts and language models to build a definitive list of typical inputs and expected outputs for your program. It was impressed upon you the importance of collecting useful examples of your task, from which we can learn the ‘rules’ of how your program should work. But a dataset alone isn’t enough, you need a formal way to measure whether your program is getting better or worse. That’s where evaluation metrics come in.

Evaluation is an expert topic, and the primary preoccupation of AI engineers who have a product in production. Ultimately, if you don’t have good eval metrics, you don’t know if your application is failing or succeeding, and can’t anticipate whether a change you want to make will help or harm your users. Context engineering starts and ends with evaluation: it’s how you notice there’s ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Context Engineering for Multi-Agent Systems

Publisher Resources

ISBN: 0642572261603Errata Page

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

Context Engineering with DSPy

by Mike Taylor

Chapter 5. Formalizing Evaluation Metrics

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.