Chapter 5. Evaluation and Optimization Strategies
We’ve now constructed our multimodal question-answering agent, a system capable of ingesting diverse data types and providing relevant answers. It works; it fulfills its designed function. In the world of LLMs and agents, however, “functional” is just the starting line. The real challenge—and where true value is unlocked—lies in the journey from functional to optimal.
How quickly does it respond? How consistently accurate is it across a vast range of unseen queries, especially ambiguous ones? If it uses tools, how reliably and efficiently does it invoke them? Are its responses not just correct, but also concise, helpful, and perfectly aligned with the user’s nuanced intent? And, critically, as your systems evolve and interact with more complex data and tasks, how do you ensure they maintain performance, learn from experience, and continuously improve?
This chapter explores practical approaches for evaluation and optimization—two sides of the same coin in the journey to production excellence. You can’t meaningfully improve what you can’t measure, and you can’t know if your optimizations are effective without robust evaluation methods. We’ve structured this chapter to reflect this natural cycle: first establishing frameworks for systematically measuring performance across multiple dimensions, then applying targeted optimization techniques based on those insights.
In the evaluation section, we’ll explore both human-centered assessment ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access