Foreword by Sharon Zhou
The first time I saw a RAG system fail in production, it was because someone had naively chunked their documents on fixed character boundaries and split a legal clause in half. Clause A was in one chunk with some of clause B, and the rest of clause B was in another. The problem was that the second chunk provided a useful, common exception. The RAG system retrieved the first chunk but not the second based on the user’s question, so unfortunately, the model answered the user’s question with the opposite of what the contract said. Just think: if you were given incomplete or faulty knowledge through a Google search, you’d also have trouble giving the right answer.
No one building the system had been thinking about chunking strategy, not critically. They had been busy debating about which LLM to use. That’s why a book like this is so important for those building RAG with LLMs and agents in production.
RAG looks deceptively simple:
- Chunk your documents—easy, that’s a string split.
- Embed your chunks—easy, that’s a lightweight model API call in a for loop.
- Retrieve the relevant chunks—easy, that’s just using search, which has been around a lot longer than modern AI, so in a way, it should have best practices baked in already.
- Hand those chunks to an LLM—easy, that’s just appending strings to another string to form a prompt.
You can build a working prototype by one-shotting a language model. But… you can also spend the next year working through parsing, chunking, ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access