Chapter 5. Embedding Vectors, Vector Stores, and Running Models Locally
This chapter introduces three key concepts that make up the foundation of almost all AI-powered applications: embedding vectors, vector stores, and their combination with augmented queries in an architecture called retrieval-augmented generation. We will also tell you more about local model inferencing. We focus on the practical use of local LLMs and how to interact with them via Java-based tools and frameworks. Especially for developers, this is essential to allow effective integration of AI capabilities into applications on their local machines.
You’ll learn how embeddings capture semantic meaning from raw input, how vector stores enable efficient similarity-based retrieval, and how these components integrate with LLMs to power features like semantic search, classification, and long-context memory. The emphasis is on running these capabilities locally for performance, cost, privacy, or offline requirements.
This is a foundational chapter that prepares you for the hands-on implementations in the rest of the book. It builds the necessary understanding of how embeddings and local inference relate to each other, so you can confidently apply them to Java applications in the chapters that follow.
Embedding Vectors and Their Role
Before LLMs can reason about data, they need a way to interpret it. They do this with numbers. This is why we need to talk about embedding vectors. In this section, you’ll learn what ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access