Chapter 6. Evaluating Multi-Turn Conversations
Until now, we have focused on evaluating interactions where the user sends one message and the agent sends one response back. We call that a single-turn interaction, meaning one exchange between the user and the agent. Many applications involve multi-turn conversations, where the user and agent go back and forth multiple times. Each exchange (one user message and the agent’s response) is a turn. A full conversation from start to finish is a session.
In multi-turn conversations, the agent has to maintain context across turns, follow instructions over time, and respond coherently as the conversation develops. This creates new evaluation challenges that single-turn methods do not cover.
In this chapter, you will learn:
-
How to evaluate at the session, turn, and coherence levels
-
When to isolate failures as single-turn problems versus genuinely multi-turn issues
-
How to use perturbation testing to probe robustness
-
How to build session-level evaluators before investing in turn-level analysis
The core evaluation ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access