Chapter 3. RAG Part II: Chatting with Your Data
In the previous chapter, you learned how to process your data and create and store embeddings in a vector store. In this chapter, you’ll learn how to efficiently retrieve the most relevant embeddings and chunks of documents based on a user’s query. This enables you to construct a prompt that contains relevant documents as context, improving the accuracy of the LLM’s final output.
This process—which involves embedding a user’s query, retrieving similar documents from a data source, and then passing them as context to the prompt sent to the LLM—is formally known as retrieval-augmented generation (RAG).
RAG is an essential component of building chat-enabled LLM apps that are accurate, efficient, and up-to-date. In this chapter, you’ll progress from basics to advanced strategies to build an effective RAG system for various data sources (such as vector stores and databases) and data structures (structured and unstructured).
But first, let’s define RAG and discuss its benefits.
Introducing Retrieval-Augmented Generation
RAG is a technique used to enhance the accuracy of outputs generated by LLMs by providing context from external sources. The term was originally coined in a paper by Meta AI researchers who discovered that RAG-enabled models are more factual and specific than non-RAG models.1
Without RAG, the LLM relies solely on its pretrained data, which may be outdated. For example, let’s ask ChatGPT a question about a current event ...