AI Superstream: Context Engineering
Published by O'Reilly Media, Inc.
Build reliable AI (agentic) systems with prompts, tools, RAG, memory and more
The performance and reliability of an AI-powered system is determined not only by the strength of the particular AI model underneath it, but also the data or context the model is given for any particular task. Engineering large contexts that an AI model or an even more complex agentic AI system can leverage is challenging, and bigger is not always better. Join our experts to explore the art and skill of context engineering and its essential components from prompting and retrieval to tool use and memory.
What you’ll learn and how you can apply it:
- Learn the essential building blocks of context engineering for AI
- Explore the challenges of context engineering and the techniques and tools that are used by industry experts to address them
- Learn from the real-world experience of engineers who are building agentic AI systems
Recommended follow-up:
- Read Prompt Engineering for LLMs (book)
- Take AI Memory Management in Agentic Systems (live online course with Richmond Alake)
- Take Context Engineering with MCP (live online course with Tim Warner)
- Take AI Context Engineering (live online course with Skanda Vivek)
- Read Context Engineering with DSPy (early release book)
- Take Hands-On Context Engineering (live online course with Lucas Soares)
- Take Context Engineering for AI-Assisted Coding (live online course with Chelsea Troy)
Schedule
The time frames are only estimates and may vary according to how the class is progressing.
Introduction – Angelina Yang (5 minutes)
Angelina welcomes you to the AI Superstream.
Keynote – Scouting the Web: What Happens When the Entire Web is the Agent’s Context? – Dhruv Batra (15 minutes)
Imagine an AI agent that monitors the web—for news, product price drops, reservations, tickets, leads—anything you care about. The agent must be always-on for weeks or months, broad in its coverage of the entire web, adaptable to changing information online and users’ preferences, precise in its reporting with citations, self-aware enough to not repeat itself and to contextualize new findings, all while not blowing up its context and your budget. How would you build such an agent? Dhruv Batra, Yutori's cofounder and chief scientist, describes how the company did it.
How Long Contexts Fail (and How to Fix Them) – Drew Breunig (30 minutes)
Million-token context windows promised a new era for AI agents—just throw everything in the prompt and let the model handle it. But in practice, overloaded contexts fail in predictable ways: hallucinations compound over time, models repeat past actions instead of reasoning forward, irrelevant tools degrade output quality, and accumulated information contradicts itself. Writer and technology leader Drew Breunig draws from recent research to show you real examples of each of these failure modes and introduces techniques that keep contexts under control, including selective retrieval, dynamic tool loading, context quarantine, pruning, summarization, and offloading.
Break (5 minutes)
Context Rot: Its Implications and Potential Solutions – Jeff Huber (30 minutes)
We assume that large language models process context uniformly—that is, the model should handle thousands of tokens just as reliably as it does a hundred. But in practice, this assumption does not hold, even on simple tasks. CEO and cofounder of Chroma, Jeff Huber, shares the results of Chroma’s evaluation of 18 large language models, including the state-of-the-art GPT-4.1, Claude 4, Gemini 2.5, and Qwen3, showing how models’ performances grow increasingly unreliable as input length grows.
How to Tell If Your Agent Used the Right Stuff – Apurva Misra (30 minutes)
Many so-called “agent failures” are actually context failures in disguise. Apurva Misra, founder of Sentick, explains techniques that help you tell whether your agent saw and used the right context, including tracing and attribution, golden datasets for context-aware evaluation, and targeted probes to test retrieval quality. You’ll come away with a practical toolkit for approaching bad answers as context problems and for systematically improving reliability without reaching for a bigger model.
Break (5 minutes)
Linking Memory to Context via Knowledge Graphs and Ontologies – Paul Iusztin (30 minutes)
Even though LLMs boast support for inputs of over 150,000 tokens, performance quickly degrades. Also, “bad” data within your context window only compounds the phenomenon known as “context rot.” You might think that the solution is just to keep relevant data in the context window, but it’s not that simple. Siloed data and data aggregated from private databases, email, video meetings, documentation, and cloud create fragmentation that makes retrieving the correct data difficult. A popular solution is knowledge graphs. Paul Iusztin, bestselling author of LLM Engineer’s Handbook, explores how to keep context windows in check by offloading your knowledge into a long-term memory layer built with knowledge graphs and ontologies. Using graph‑based retrieval techniques (GraphRAG), he also shows how to keep only the data relevant to the task at hand.
Building the Brain: How Predictive Memory Transforms AI Agents – Shawkat Kabbara (30 minutes)
Most AI agents fail in production because they can’t remember or predict context the way human brains do. Shawkat Kabbara, founder and CEO of Papr, demonstrates how to build predictive memory systems inspired by neuroscience—from working memory to episodic recall—using custom schemas and GraphQL. You’ll learn why memory is prediction, not search, and see live demos of brain-inspired architecture that achieves 91% accuracy on Stanford benchmarks while delivering less than 100-millisecond response times for intelligent applications.
Break (5 minutes)
Why AI Gets Confused and How to Fix It: 4 Rules for Context Engineering – Yuzheng Sun (30 minutes)
Building reliable AI agents isn’t just about writing better prompts; it’s about managing the information you feed them. Dr. Yuzheng Sun, founder of Superlinear Academy, breaks context engineering down into four practical principles to stop your AI from hallucinating or getting lost: injecting the right information at the right time, removing noise so the model focuses on the signal, storing knowledge outside the prompt for easy access, and breaking complex tasks into smaller pieces to respect the model’s limits.
Engineering Context Quality by Architecting Agent Memory – Mikiko Bazeley (30 minutes)
Mikiko Bazeley, staff developer advocate at MongoDB, shows how you can use MongoDB Atlas and Voyage AI models to improve context quality and agent performance by designing and implementing agent memory via short-term and long-term persistence.
Closing Remarks – Angelina Yang (5 minutes)
Angelina closes out today’s event.
Your Hosts and Selected Speakers
Angelina Yang
Angelina Yang is the cofounder of West Operators, where she helps companies become discoverable to AI systems like ChatGPT, Claude, and Perplexity. She has built LLM applications powering over 10 million daily interactions and works with founders backed by a16z, Sequoia, and Lightspeed.
Angelina is a two-time fast.ai Fellow under Jeremy Howard and a winner of Anthropic’s 2024 AI developer contest. She’s also the founder and host of TwoSetAI, a YouTube channel with 90K+ subscribers where she interviews AI founders. Her expertise bridges AI engineering and growth strategy, helping businesses turn AI visibility into a predictable channel for discovery.
Dhruv Batra
Dhruv Batra is a cofounder and the chief scientist of Yutori and an adjunct professor at Georgia Tech. Previously, he was a senior director leading embodied AI at Meta and an associate professor in the School of Interactive Computing at Georgia Tech. His research lies at the intersection of machine learning and computer vision, with forays into robotics and natural language processing. His research has been supported by NSF, ARO, ARL, ONR, DARPA, Amazon, Google, Microsoft, and NVIDIA and has been extensively covered by CNN, BBC, CNBC, Bloomberg Business, The Boston Globe, MIT Technology Review, Newsweek, and NPR, among others. He’s a recipient of the Presidential Early Career Award for Scientists and Engineers (2019), the Army Research Office’s Early Career Award (2018), the Office of Naval Research’s Young Investigator Program award (2017), the National Science Foundation CAREER award (2014), and many others.
Drew Breunig
Drew Breunig is a writer and technology leader, currently assembling The Context Engineering Handbook for O’Reilly. He previously ran strategy and data science at PlaceIQ (acquired by Precisely) and cofounded Reporter, an award-winning quantified self app. He writes about AI, data, and geospatial technology at dbreunig.com.
Jeff Huber
Jeff Huber is the CEO and cofounder of Chroma. He has worked for 10 years in applied machine learning and building developer tools.
Apurva Misra
Apurva Misra is a machine learning engineer, speaker, and founder of Sentick, where she helps growing companies unlock practical, ROI-driven AI solutions across automations, predictive analytics, and copilots. She holds a master’s degree from the University of Waterloo, where her research focused on detecting driver cognitive distraction. Her work has been published in IEEE Access. Apurva speaks at industry events and advises startups on shipping AI safely and measurably, drawing on hands-on experience building production systems and a strong cloud/engineering background. In her free time, she’s learning Spanish and is always eager to hear about new hidden-gem eateries.
Paul Iusztin
Paul Iusztin is the author of the bestselling LLM Engineer’s Handbook, lead instructor of an agentic AI engineering course, and founding AI engineer of a San Francisco startup. He’s obsessed with making knowledge accessible through AI. With over 10 years of experience and 20 apps shipped, he teaches AI engineering as he wanted to at the beginning of his career: end-to-end, from idea to production, and from data collection to deploying, monitoring, and evaluation, with a focus on AI principles, software patterns, and infrastructure systems that will thrive in a future dominated by AI coding tools. His ultimate goal is to help other engineers escape the PoC purgatory and maximize their AI engineering skills.
Shawkat Kabbara
Shawkat Kabbara is the founder and CEO of Papr, building the predictive memory layer for AI agents with an industry-leading 91% retrieval accuracy on Stanford’s STARK benchmark. His experience spans machine learning, NLP, and search, and he has contributed to products deployed to billions of users. Previously, he was the founding product lead for the Apple Intelligence platform, launching Vision Pro and the App Intents SDK, Apple’s on-device AI action layer. He also held leadership roles at Meta and Microsoft.
Yuzheng Sun
Dr. Yuzheng Sun is the founder of Superlinear Academy and a leading voice in AI education. A top-rated instructor on Maven, his “Build with AI” course has guided over 2,000 students through 10+ cohorts with a near-perfect 4.9/5 rating. Previously, Yuzheng served as principal data scientist and evangelist at Statsig (acquired by OpenAI). His deep industry expertise stems from leadership roles including director of data science at Tencent, data scientist at Meta, and economist at Amazon.
Mikiko Bazeley
Mikiko Bazeley is a staff developer advocate at MongoDB, focused on empowering developers to build AI that scales. Previously, she worked as an MLOps engineer, data scientist, and strategic analyst for companies like Intuit Mailchimp, Autodesk, Fireworks AI, and Sunrun.