Chapter 13. Testing AI Systems
MLOps is a set of best practices for the automated testing, versioning, and monitoring of the ML pipelines and ML assets that power our AI systems. We introduced MLOps in Chapter 1, data validation tests in Chapter 6, and unit testing for transformation functions in Chapter 7. But there is still much more ground to cover. If you are to build a reliable, governed, maintainable AI system, you need integration tests for each of your ML pipelines, run both during development and before deployment. We will look at how to write feature pipeline tests and model validation tests and how to test model deployments. We will look at how to reliably package our ML pipelines with automatic containerization in development, staging, and production environments. We will also present offline testing of agents and LLM workflows with evals.
Testing is key to building a high-quality AI system. Your testing should be at a level where you are so confident in your tests that you will deploy to production on a Friday. And even if an upgrade fails, you will be easily able to roll back your changes. In the next chapter we will focus on operational concerns of MLOps, but in this chapter, we will look at tests run during development and how to automate offline testing for AI systems.
Offline Testing
The starting point for building reliable AI systems is testing. AI systems require more levels of testing than traditional software systems. Small bugs in data or code can easily ...