Preface
In 2017, Google researchers published the paper “Attention Is All You Need” and introduced the Transformer architecture, a breakthrough that reshaped modern AI. Over the following years, large foundation models demonstrated what happens when these ideas are scaled. The models began to write coherent text, answer complex questions, and generate working code. For the first time, software systems could interact with language in ways that felt broadly useful in real applications, not just impressive in research demos.
Yet these models had an important limitation. They were powerful but isolated, prone to hallucinating facts, lacking access to up-to-date information, and unable to work with private company data. Retrieval-augmented generation (RAG) addresses these gaps by coupling language models with external knowledge sources. I see RAG and agentic RAG as a key step toward AI systems that emulate human problem-solving. They actively gather new information, interpret it in context, and continuously plan their next steps based on their findings. By connecting foundation models to external knowledge sources, RAG grounds model outputs in verifiable data and enables systems to reason over trusted information when handling complex tasks.
This book is about building production-ready RAG systems. Each recipe focuses on a concrete engineering challenge that appears when moving from prototype to dependable application and explains the trade-offs behind key design decisions. You’ll ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access