AI Agent Memory Management Bootcamp

Intermediate

Empowering AI agents with robust memory

Course outcomes

Differentiate between short-term, long-term, episodic, semantic, and procedural memory and explain how each maps to agent behaviour
Implement core memory operations: initialization, segmentation, retrieval, update, deletion, and creation: within AI agents
Apply lexical, vector, and hybrid RAG techniques to integrate memory into LLM-based applications
Configure Oracle AI Database as a memory provider with document storage, indexing, and vector search
Distinguish between memory-augmented and memory-aware agents, and between agent-triggered and deterministic memory operations
Build end-to-end memory-aware agents using Oracle AI Database, LangGraph, and LangMem
Evaluate RAG pipelines using IR metrics, RAGAS, and Galileo, isolating retrieval failures from generation failures
Assess agent performance across task completion, tool use, trajectory quality, cost, and safety using AgentBench, AgentEval, and LangSmith
Benchmark memory systems using LoCoMo, LongMemEval, and MemBench, and apply the RBC framework to assess system-level memory quality
Run structured competitive benchmarks across memory architectures and translate results into architecture decisions

Join expert Richmond Alake for a two-day deep dive into the theory, research, and practical implementation of memory management and evaluation in AI agent systems. Day 1 establishes the conceptual and engineering foundations: starting with what memory means for an AI agent and how it parallels human cognition, then moving through the core components of robust memory management systems: initialization, segmentation, retrieval, updating, deletion, and creation. You will examine advanced retrieval-augmented generation techniques including lexical, vector-based, and hybrid approaches, as well as agentic RAG, and gain hands-on experience implementing context engineering and memory engineering techniques from the ground up. The day closes with a practical session on implementing memory-aware agent workflows using Oracle AI Database, LangGraph, and LangMem, with a clear focus on the distinction between memory-augmented and memory-aware agents and between agent-triggered and deterministic memory operations.

Day 2 shifts focus to evaluation. You will build the skills to measure and diagnose memory system performance at every layer: from classical IR metrics and RAG pipeline evaluation through to agent-level assessment and memory-specific benchmarking. Coverage includes production-grade evaluation frameworks such as RAGAS, Galileo, AgentBench, and LangSmith, as well as dedicated memory benchmarks including LoCoMo, LongMemEval, and MemBench. The day concludes with a structured benchmarking and competitive analysis module, where you will run comparative evaluations across memory system architectures and translate results into concrete architecture decisions.

What you’ll learn and how you can apply it

Explain what constitutes memory in AI agents, including memory types and analogies with human cognition
Implement memory operations: initialization, segmentation, retrieval, update, deletion, and creation: within agent workflows
Apply context engineering and memory engineering techniques and understand where each discipline begins and ends
Distinguish memory-augmented from memory-aware agents, and agent-triggered from deterministic memory operations
Use generative agents, A-Mem, MemGPT, and agent workflow memory as reference architectures for memory system design
Implement lexical, vector, and hybrid RAG approaches with hands-on examples
Configure Oracle AI Database for persistent memory storage with document modeling, indexing, and native vector search
Embed memory into agent workflows for real-time decision-making and long-horizon planning
Apply IR and ranking metrics to evaluate and diagnose retrieval quality in production memory systems
Evaluate RAG pipelines with RAGAS and Galileo and isolate retrieval from generation failures
Assess agent behaviour across task completion, tool use, trajectory, latency, cost, and safety
Apply LoCoMo, LongMemEval, and MemBench to assess memory quality across the full lifecycle
Run competitive benchmarks across memory architectures and produce analysis reports that inform design decisions
Build context-aware, adaptive agents with persistent and dynamic memory
Design systems for long-term learning, personalization, and reduced computational overhead

This live event is for you because...

You’re an AI or ML engineer who works on LLM-based applications and needs to implement persistent memory for complex tasks.
You’re a data scientist who builds intelligent systems and wants to understand how to enhance models with robust memory management.
You’re a software developer in the AI space who needs to implement scalable and efficient memory solutions for agent-based applications.
You’re an experienced AI practitioner looking to advance your knowledge of state-of-the-art memory architectures for agentic systems.

Prerequisites

Familiarity with large language models (e.g., GPT-3.5, GPT-4) and their limitations (context windows, tokenization)
Proficiency in Python, along with experience using common AI/ML libraries
Basic experience with databases (preferably Oracle AI Database) and data storage concepts
A general understanding of concepts like embeddings, vector search, and retrieval-augmented generation (RAG) will be helpful
(Optional) Exposure to cognitive architectures (e.g., Soar, ACT-R) or previous experience with AI agents can provide additional context but is not required

Recommended preparation:

Install Python (version 3.8 or higher)
Ensure you have access to a Oracle AI Database instance (either local or via Oracle AI Database Atlas) and familiarize yourself with its basic operations

Recommended follow-up:

Read LLM Engineer’s Handbook (book)
Read Building Applications with AI Agents (book)
Read AI Engineering (book)
Read RAG-Driven Generative AI (book)
Read Hands-On Large Language Models (book)

Schedule

The time frames are only estimates and may vary according to how the class is progressing.

Day 1

AI agent memory (55 minutes)

Presentation: Overview of AI memory concepts: short-term (context window) versus long-term (conversational memory), episodic versus semantic, and their differences; analogies to human cognition and memory; challenges of memory in AI agents and agentic systems; real-world examples and intuition behind AI memory
Group discussion: Which aspect of memory do you find most challenging to conceptualize?
Q&A
Break

Agent memory management (65 minutes)

Presentation: Overview of retrieval-augmented generation (RAG) methods: lexical, vector, hybrid, agentic RAG, and rerankers; core components of memory management: initialization, segmentation, structure, creation, retrieval, update, deletion; best practices and design patterns in agentic systems; overview of memory design architectures from generative agents, A-Mem, MemGPT, and agent workflow memory
Group discussion: Which memory component do you believe is most critical for long-term agent coherence?
Q&A
Break

AI retrieval (60 minutes)

Presentation: Overview of AI retrieval: definition, concepts, and techniques; introduction to the concept of "memory providers" and their role in AI application architecture; examples of memory providers
Hands-on exercises: Setting up the environment; indexing and vector search implementation; building retrieval functions; end-to-end agentic RAG system; performance monitoring and logging implementation
Q&A
Break

Agent memory technique implementation (50 minutes)

Hands-on exercises: Implementing memory management components using Oracle AI Database, LangGraph, and LangMem
Hands-on exercises: Implementing context engineering and memory engineering techniques from the ground up: encoding strategies, context window management, and memory injection patterns
Hands-on exercises: Understanding the difference between memory-augmented and memory-aware agents: architectural distinctions, capability boundaries, and when each pattern applies
Hands-on exercises: Overview and implementation of agent-triggered and deterministic memory operations: when memory actions are initiated by the agent versus enforced by the system, and the trade-offs between the two

Wrap-up and Q&A (10 minutes)

Day 2

Traditional evaluation metrics and RAG pipeline metrics (55 minutes)

Presentation: Anatomy of a memory-aware agent and the case for evaluation: memory store, ingestion pipeline, retrieval layer, context assembly layer, and agent reasoning loop, each mapped to its failure modes and evaluation targets; classical IR foundations: Precision, Recall, and F1; ranking quality metrics: MRR, nDCG, and MAP; RAG-specific evaluation dimensions: Faithfulness, Answer Relevance, Context Precision, and Context Recall; RAG evaluation frameworks: RAGAS and Galileo; chunking and indexing quality signals
Notebook: Computing IR metrics from scratch in Python; RAG evaluation with RAGAS; production tracing and evaluation with Galileo
Q&A
Break

Agent evaluation metrics (55 minutes)

Presentation: Why agent evaluation differs from RAG evaluation; task completion and goal achievement: binary and partial-credit scoring; tool use accuracy: precision and recall applied to tool invocation sequences; trajectory and decision quality: trajectory faithfulness metrics; latency, efficiency, and cost metrics; agent evaluation frameworks: AgentBench, AgentEval, and LangSmith; safety and alignment metrics: adversarial test sets and red-teaming
Notebook: Scoring task completion; tool use evaluation; trajectory evaluation with LangSmith; efficiency and cost profiling
Q&A
Break

Memory evaluation as a distinct discipline (MemEval) (55 minutes)

Presentation: Memory-specific failure modes not captured by RAG or agent metrics; unit-level memory evaluation: precision and recall at the memory-block level; memory retrieval evaluation: MRR, nDCG, and Hit Rate applied to memory queries, including suppression quality; continuity and consistency evaluation: cross-session correctness, persona stability, and contradiction detection; memory lifecycle evaluation: ingestion, consolidation, forgetting, and archiving; LoCoMo, LongMemEval, and MemBench; the RBC lens for memory evaluation: Reliable, Believable, Capable
Notebook: Evaluating memory units in practice; memory retrieval scoring; running LoCoMo, LongMemEval, and MemBench; building a continuous memory evaluation pipeline
Q&A
Break

Memory benchmarking and competitive analysis (65 minutes)

Presentation: What memory benchmarking means in practice: defining comparison criteria, selecting representative workloads, and establishing baseline performance; overview of memory system architectures under comparison: filesystem-based, vector database, relational, and hybrid approaches; evaluation dimensions for competitive analysis: retrieval latency, recall quality, consistency under load, memory lifecycle support, and cost per operation; interpreting benchmark results: avoiding benchmark gaming, understanding workload sensitivity, and translating results into architecture decisions
Notebook: Running a structured benchmark across two or more memory system configurations; scoring each system against retrieval quality, latency, and lifecycle metrics; producing a comparative analysis report with findings and recommendations
Q&A

Wrap-up and Q&A (10 minutes)

Your Instructor

Richmond Alake
Richmond Alake is a highly experienced Machine Learning Architect and Engineer with over five years of expertise in the field. He specializes in Computer Vision and Deep Learning and has a proven track record of successfully developing and integrating deep learning models to solve a wide range of problems, such as motion detection, object detection, and pose estimation. Throughout his career, he has worked with a diverse range of clients, including large conglomerates, financial institutions, and small startups. In addition to his professional work, Richmond also serves as an AI advisor to a number of startups in the UK and the US.

With a background in building websites and mobile applications, Richmond is a firm believer in using technology to solve everyday problems. He has extensive knowledge of Machine Learning and has written over 200 articles on the subject, gaining over a million views. He was recognized as one of Medium's top AI writers in 2020/2021 and has collaborated with companies such as O'Reilly, BuiltIn and Nvidia to develop effective educational and informative learning materials on AI.

Currently, Richmond Alake is a Machine Learning Architect at Slalom Build UK. As the first hire of the machine learning practice in the UK division, he is responsible for helping organizations move from machine learning research to productionisation and assisting maturing organizations in promoting AI models into existing infrastructure to drive commercial and business value. His main role as an ML Architect is to assist organizations in developing and maintaining machine learning pipelines by implementing MLOps principles, techniques, and tooling. He is well-versed in Feature Stores and has conducted internal training for Data Engineers, Data Scientists, and ML Engineers.

linkedin link search

Skill covered

Data Lake

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills