Building Memory with Vector Databases and RAG Architecture
Published by O'Reilly Media, Inc.
From embeddings to production-grade hybrid search for AI
What you’ll learn and how you can apply it
- Build and deploy a functional vector database for RAG applications
- Compare Chroma, Pinecone, and Weaviate and select the right tool for the job
- Understand the internal indexing algorithms (HNSW, IVF) that make vector search scalable
- Implement hybrid search (keyword plus semantic) to reduce AI hallucinations and “keyword blindness”
Course description
Move beyond surface-level RAG demos to master the foundation of intelligent retrieval systems: vector databases and memory architecture. Join Sumit Shukla on a deep dive into how vector databases work, moving from embedding strategies to indexing internals through live comparisons of tools like Chroma, Pinecone, and Weaviate. You’ll learn what actually happens when you store and retrieve semantic memory—and how to tune your system for performance, accuracy, and cost.
You’ll explore real-world challenges like scaling to millions of memory records, optimizing hybrid search (combining dense vectors with filters, keywords, and metadata), and designing memory pipelines that persist, retrieve, and update user interactions across sessions. By the end of the course, you’ll understand vector-backed memory not just as a plug-in but as an architectural layer that makes advanced agents and context-rich systems possible.
This live event is for you because...
- You’re a backend developer or data scientist transitioning into an AI engineering role.
- You’re a software engineer, machine learning engineer, or technical lead.
Prerequisites
- Access to a Google account (for Colab)
- An OpenAI API key or a Hugging Face access token (free)
- Proficiency in Python
- Basic understanding of LLMs (prompts, context windows)
- Knowledge of RAG basics
- Experience with OpenAI API
- Understanding of AI frameworks
- No prior experience with vector DBs required
Recommended preparation:
- OpenAI API setup
Recommended follow-up:
- Take Building Reliable RAG Applications: From PoC to Production (live online course with Sarang Sanjay Kulkarni)
- Take AI Memory Management in Agentic Systems (live online course with Richmond Alake)
Schedule
The time frames are only estimates and may vary according to how the class is progressing.
Understanding vector memory and RAG foundations (60 minutes)
- Presentation: Why SQL fails at semantic search; visualizing embeddings and vector spaces
- Group discussion: Where the vector DB fits in the LLM stack and why context windows aren't enough
- Hands-on exercise: Create embeddings with Python and calculate cosine similarity manually to understand the math
- Q&A
- Break
The vector DB showdown (60 minutes)
- Presentation: Comparing Chroma (local), Pinecone (managed/serverless), and Weaviate (AI native)
- Hands-on exercise: Implement the same RAG ingestion pipeline in Chroma and Faiss to see the syntax and architectural differences
- Group discussion: When to use serverless or self-hosted; cost and privacy
- Break
Indexing algorithms (60 minutes)
- Presentation: How to search a million vectors in milliseconds; visualizing flat, inverted file (IVF), and hierarchical navigable small world (HNSW) indexing
- Hands-on exercise: Run a script to insert 10K vectors and time the search speeds of a “brute force” and a HNSW index
- Q&A
- Break
Advanced patterns—hybrid search (50 minutes)
- Presentation: Why “Error 503” fails in pure vector search; introduction to sparse vectors and BM25
- Hands-on exercise: Use LangChain to combine a sparse retriever (keyword) and a dense retriever (semantic) with reciprocal rank fusion (RRF)
Wrap-up and Q&A (10 minutes)
Your Instructor
Sumit Shukla
Sumit Shukla is an experienced AI practitioner and educator specializing in large language models and enterprise AI infrastructure. As director of AI at Scaletrix.ai, he leads teams building production-grade RAG systems for financial and healthcare clients. With a background in ed tech, Sumit excels at breaking down complex mathematical concepts—like vector spaces and indexing algorithms—into intuitive, hands-on engineering lessons.