Overview
AI models are only as good as the context they can retrieve. Without the right data at the right moment, even the most powerful models fail. You might even say that search and retrieval is the most important layer of the AI stack.
Scaling Search and Retrieval for Contextual AI is your guide to designing modern search infrastructure for contextual AI. Written by Nicholas Knize, the creator of AWS OpenSearch, this book explores the full lifecycle of search systems—from indexing and query execution to sharding, vector search, hybrid retrieval, and real-world AI integration.
What makes this book unique is its systems-first, vendor-neutral approach. Rather than explaining how to operate existing tools, it teaches you how to build the tools themselves. Whether you're modernizing an aging cluster, integrating RAG into your LLM pipeline, or simply trying to understand what makes search and retrieval tick, this is your blueprint.
- Architect search and retrieval systems that enable scalable, performant, and secure AI inference
- Navigate the trade-offs between indexing and retrieval models
- Apply proven patterns to build fault-tolerant, efficient search infrastructure
- Support hybrid and AI-native workloads with structured, unstructured, and vector data
- Optimize performance, storage, and resilience across varied deployment topologies and constraints
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access