Chapter 10. Testing: Evaluation, Monitoring, and Continuous Improvement
In Chapter 9, you learned how to deploy your AI application into production and utilize LangGraph Platform to host and debug your app.
Although your app can respond to user inputs and execute complex tasks, its underlying LLM is nondeterministic and prone to hallucination. As discussed in previous chapters, LLMs can generate inaccurate and outdated outputs due to a variety of reasons including the prompt, format of user’s input, and retrieved context. In addition, harmful or misleading LLM outputs can significantly damage a company’s brand and customer loyalty.
To combat this tendency toward hallucination, you need to build an efficient system to test, evaluate, monitor, and continuously improve your LLM applications’ performance. This robust testing process will enable you to quickly debug and fix AI-related issues before and after your app is in production.
In this chapter, you’ll learn how to build an iterative testing system across the key stages of the LLM app development life-cycle and maintain high performance of your application.
Testing Techniques Across the LLM App Development Cycle
Before we construct the testing system, let’s briefly review how testing can be applied across the three key stages of LLM app development:
- Design
-
In this stage, LLM tests are applied directly to your application. These tests can be assertions executed at runtime that feed failures back to the LLM for self-correction. ...