Chapter 8. Optimization
Your agent works. It reasons through multi-step problems, queries live systems, and learns from its own mistakes. In a notebook, it looks like the future of enterprise AI. Then someone asks what it costs to run.
The invoice from a single week of production traffic, every workflow node powered by a frontier model, every query routed through the most expensive option, is enough to shelve the project. Cost is only the first problem. An agent with unrestricted access to your knowledge graph can traverse from a public service catalog to internal cost data to employee records in a single query. A graph traversal that completes in milliseconds during development takes seconds at production scale, and a federation of specialist models that reasons beautifully offline becomes a latency bottleneck when an SRE is waiting for an incident diagnosis.
Production is where three forces collide: cost, governance, and performance. This chapter addresses all three.
We will start with selective intelligence, the ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access