Chapter 2. Evaluating and Optimizing RAG
Feedback from users has shown that LLM responses can be too generic or noticeably AI generated. As humans, we are very sensitive to small discrepancies, and with the numerous options available, customers are very likely to avoid a low-quality application in favor of another provider. To ensure high-quality applications that attract customers, you need to be able to measure performance and make improvements. In this chapter, we will learn how to evaluate RAG applications and the levers of choice for optimizing them.
RAG-based applications, in particular, have a number of distinct components to be optimized according to the use case. These include at a minimum text extraction, chunking or splitting, embedding, database choice, retrieval strategy, and LLM model choice (including prompt engineering) for generation. Figure 2-1 shows these six components of a basic RAG application.
The components on the left denote the indexing pipeline, where documents are processed, embedded, and added to a database. The components on the right are used for querying the database, retrieving information based on the input query and generating a response.
Figure 2-1. Components of a RAG application
- Step 1: Text extraction (preprocessing)
-
The first step is to preprocess the documents. This may consist of a few steps depending on where the data comes from, including ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access