Chapter 2. LLMs and Evaluation Basics
In Chapter 1, we motivated why evaluation is essential. But before we can evaluate, we need something to evaluate! This chapter equips you with everything you need to get a first application running and to understand its moving parts.
Specifically, we will:
-
Ground the discussion with a simple example application.
-
Break down the components of an LLM application—from the simplest single call to complex agent loops.
-
Describe some fundamentals of creating prompts or initial versions of these components.
-
Clarify the two main evaluation modes we will use throughout this book:
-
Absolute evaluation: judging whether one configuration (e.g., a specific prompt, model, or tool setup) is correct enough to ship.
-
Comparative evaluation: comparing multiple options (A/B tests, pairwise, ranking, leaderboards) to decide which is better.
-
By the end of this chapter, you will have a clear mental model of what makes up an LLM application and some of the types of evaluation you might do for your application.
Components of ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access