Building AI Intensive Python Applications
by Rachelle Palmer, Ben Perlmutter, Ashwin Gangadhar, Nicholas Larew, Sigfrido Narváez, Thomas Rueckstiess, Henry Weller, Richmond Alake, Shubham Ranjan
9
LLM Output Evaluation
Regardless of the form factor of your intelligent application, you must evaluate your use of large language models (LLMs). The evaluation of a computational system determines the system’s performance, gauges its reliability, and analyzes its security and privacy.
AI systems are non-deterministic. You cannot be certain what an AI system will output until you run an input through it. This means that you must evaluate how the AI system performs on a variety of inputs to have confidence that it performs in line with your requirements. To be able to change the AI system without introducing any unexpected regressions, you also need to have robust evaluations. Evaluations can help catch these regressions before releasing the ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access