Preface
Enterprise data platforms are shaped by who and what consumes data. For most of their history, that consumer was human. Analysts wrote SQL queries, applications executed deterministic transactions, and dashboards reflected predefined metrics. The systems we built, including databases, search engines, and pipelines, were optimized for precision, predictability, and structure. Those assumptions held for decades, and they still matter today.
The adoption of large language models (LLMs) and AI-driven applications introduces a different kind of consumer. Instead of asking precise questions, these systems retrieve information probabilistically and reason over relevance rather than correctness. Techniques such as retrieval-augmented generation (RAG) and semantic search depend on similarity-based retrieval across both structured and unstructured data. This shift does not make traditional databases obsolete, but it does expose clear limits to how they support semantic access to information.
Vector databases are an architectural response to this shift. By storing and retrieving embeddings (numerical representations that capture the meaning of text, images, or other data), vector databases enable AI systems to find relevant data without relying solely on schema or exact matches. In practice, however, many organizations encounter vector databases through isolated experiments or developer tooling. These efforts are often disconnected from enterprise data platforms, governance practices, ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access