book

Building Applications with AI Agents

Name: Building Applications with AI Agents
Author: Michael Albada
ISBN: 9781098176501

by Michael Albada

September 2025

Beginner to intermediate

354 pages

10h 47m

English

O'Reilly Media, Inc.

Read now

Unlock full access

Includes

Quizzes

Preface
What This Book Is AboutWhat This Book Is NotWho This Book Is ForNavigating This BookConventions Used in This BookUsing Code ExamplesO’Reilly Online LearningHow to Contact UsAcknowledgments
1. Introduction to Agents
Defining AI AgentsThe Pretraining RevolutionTypes of AgentsModel SelectionFrom Synchronous to Asynchronous OperationsPractical Applications and Use CasesWorkflows and AgentsPrinciples for Building Effective Agentic SystemsOrganizing for Success in Building Agentic SystemsAgentic FrameworksLangGraphAutoGenCrewAIOpenAI Agents Software Development Kit (SDK)Conclusion
2. Designing Agent Systems
Our First Agent SystemCore Components of Agent SystemsModel SelectionToolsDesigning Capabilities for Specific TasksTool Integration and ModularityMemoryShort-Term MemoryLong-Term MemoryMemory Management and RetrievalOrchestrationDesign Trade-OffsPerformance: Speed/Accuracy Trade-OffsScalability: Engineering Scalability for Agent SystemsReliability: Ensuring Robust and Consistent Agent BehaviorCosts: Balancing Performance and ExpenseArchitecture Design PatternsSingle-Agent ArchitecturesMultiagent Architectures: Collaboration, Parallelism, and CoordinationBest PracticesIterative DesignEvaluation StrategyReal-World TestingConclusion
3. User Experience Design for Agentic Systems
Interaction ModalitiesText-BasedGraphical InterfacesSpeech and Voice InterfacesVideo-Based InterfacesCombining Modalities for Seamless ExperiencesThe Autonomy SliderSynchronous Versus Asynchronous Agent ExperiencesDesign Principles for Synchronous ExperiencesDesign Principles for Asynchronous ExperiencesFinding the Balance Between Proactive and Intrusive Agent BehaviorContext Retention and ContinuityMaintaining State Across InteractionsPersonalization and AdaptabilityCommunicating Agent CapabilitiesCommunicating Confidence and UncertaintyAsking for Guidance and Input from UsersFailing GracefullyTrust in Interaction DesignConclusion
4. Tool Use
LangChain FundamentalsLocal ToolsAPI-Based ToolsPlug-In ToolsModel Context ProtocolStateful ToolsAutomated Tool DevelopmentFoundation Models as Tool MakersReal-Time Code GenerationTool Use ConfigurationConclusion
5. Orchestration
Agent TypesReflex AgentsReAct AgentsPlanner-Executor AgentsQuery-Decomposition AgentsReflection AgentsDeep Research AgentsTool SelectionStandard Tool SelectionSemantic Tool SelectionHierarchical Tool SelectionTool ExecutionTool TopologiesSingle Tool ExecutionParallel Tool ExecutionChainsGraphsContext EngineeringConclusion
6. Knowledge and Memory
Foundational Approaches to MemoryManaging Context WindowsTraditional Full-Text SearchSemantic Memory and Vector StoresIntroduction to Semantic SearchImplementing Semantic Memory with Vector StoresRetrieval-Augmented GenerationSemantic Experience MemoryGraphRAGUsing Knowledge GraphsBuilding Knowledge GraphsPromise and Peril of Dynamic Knowledge GraphsNote-TakingConclusion
7. Learning in Agentic Systems
Nonparametric LearningNonparametric Exemplar LearningReflexionExperiential LearningParametric Learning: Fine-TuningFine-Tuning Large Foundation ModelsThe Promise of Small ModelsSupervised Fine-TuningDirect Preference OptimizationReinforcement Learning with Verifiable RewardsConclusion
8. From One Agent to Many
How Many Agents Do I Need?Single-Agent ScenariosMultiagent ScenariosSwarmsPrinciples for Adding AgentsMultiagent CoordinationDemocratic CoordinationManager CoordinationHierarchical CoordinationActor-Critic ApproachesAutomated Design of Agent SystemsCommunication TechniquesLocal Versus Distributed CommunicationAgent-to-Agent ProtocolMessage Brokers and Event BusesActor Frameworks: Ray, Orleans, and AkkaOrchestration and Workflow EnginesManaging State and PersistenceConclusion
9. Validation and Measurement
Measuring Agentic SystemsMeasurement Is the KeystoneIntegrating Evaluation into the Development LifecycleCreating and Scaling Evaluation SetsComponent EvaluationEvaluating ToolsEvaluating PlanningEvaluating MemoryEvaluating LearningHolistic EvaluationPerformance in End-to-End ScenariosConsistencyCoherenceHallucinationHandling Unexpected InputsPreparing for DeploymentConclusion

10. Monitoring in Production
Monitoring Is How You LearnMonitoring StacksGrafana with OpenTelemetry, Loki, and TempoELK Stack (Elasticsearch, Logstash/Fluentd, Kibana)Arize PhoenixSigNozLangfuseChoosing the Right StackOTel InstrumentationVisualization and AlertingMonitoring PatternsShadow ModeCanary DeploymentsRegression Trace CollectionSelf-Healing AgentsUser Feedback as an Observability SignalDistribution ShiftsMetric Ownership and Cross-Functional GovernanceConclusion
11. Improvement Loops
Feedback PipelinesAutomated Issue Detection and Root Cause AnalysisHuman-in-the-Loop ReviewPrompt and Tool RefinementAggregating and Prioritizing ImprovementsExperimentationShadow DeploymentsA/B TestingBayesian BanditsContinuous LearningIn-Context LearningOffline RetrainingConclusion
12. Protecting Agentic Systems
The Unique Risks of Agentic SystemsEmerging Threat VectorsSecuring Foundation ModelsDefensive TechniquesRed TeamingThreat Modeling with MAESTROProtecting Data in Agentic SystemsData Privacy and EncryptionData Provenance and IntegrityHandling Sensitive DataSecuring AgentsSafeguardsProtections from External ThreatsProtections from Internal FailuresConclusion
13. Human-Agent Collaboration
Roles and AutonomyThe Changing Role of Humans in Agent SystemsAligning Stakeholders and Driving AdoptionScaling CollaborationAgent Scope and Organizational RolesShared Memory and Context BoundariesTrust, Governance, and ComplianceThe Lifecycle of TrustAccountability FrameworksEscalation Design and OversightPrivacy and Regulatory ComplianceConclusion: The Future of Human-Agent Teams
Glossary
Index
About the Author

Content preview from Building Applications with AI Agents

Chapter 9. Validation and Measurement

It has never been easier to build products and applications, but effectively measuring these systems remains an enormous challenge. While teams are often under pressure to ship things quickly, taking the time to rigorously evaluate performance and assess quality pays long-term dividends and enables teams to ultimately move faster and with more confidence. Without rigorous evaluation and measurement, decisions about which changes to ship become much more difficult. Rigorous measurement and validation become essential, not only to optimize performance but also to build trust and ensure alignment with user expectations.

This chapter explores methodologies for evaluating agent-based systems, covering key principles, measurement techniques, and validation strategies. We explore the critical role of defining clear objectives, selecting appropriate metrics, and implementing robust testing frameworks to assess system performance under real-world conditions. Beyond mere functionality, the reliability of agent outputs—including accuracy, consistency, coherence, and responsiveness—requires systematic scrutiny, particularly given the probabilistic nature of foundation models that often power these systems.

Throughout this chapter, we follow a customer support agent handling a common ecommerce scenario: a customer reports a cracked coffee mug and requests a refund. We’ll build on this case, exploring variations like multi-item orders, cancellations, or ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 9781098176495Errata Page

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

Building Applications with AI Agents

by Michael Albada

Chapter 9. Validation and Measurement

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.