Chapter 5. Embeddings
Embedding models convert text, images, and other content into vectors that capture semantic meaning. In a RAG system, these vectors let the retriever search large unstructured collections for content that is relevant to a user’s question. The retriever embeds the query, compares it to vectors stored in a vector database, and ranks candidates by distance. Smaller distances indicate higher semantic similarity, which determines what fits into the LLM’s limited context window.
A typical embedding-based retrieval flow looks like this:
-
Split documents into chunks and embed them when building the vector database.
-
Embed each incoming user query with the same model.
-
Compute distances between the query vector and stored vectors.
-
Retrieve the closest chunks and pass them to the LLM as context.
This chapter shows how to work with embedding models from providers such as OpenAI, Google, and open source projects. You will generate embeddings, visualize semantic relationships, measure vector distances, and use them in practical RAG pipelines. The recipes also cover model selection, multimodal embeddings, and hybrid retrieval that combines vectors with keyword or metadata filters.
You can find all the code examples for this chapter in the book’s GitHub repository.
5.1 Mapping the Linguistic Meaning of Text Chunks to a Numerical Representation
Problem
You want to map the semantic meaning of words and sentences into a numerical representation.
Solution
Use an ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access