AI Context Engineering
Published by O'Reilly Media, Inc.
Effectively handle large model contexts for maximizing GenAI quality and performance
What you’ll learn and how you can apply it
- Understand the implications and opportunities of large context windows in modern LLMs and vision models
- Design GenAI applications that effectively personalize responses using large-scale contextual data
- Apply context engineering patterns to maximize performance and cost-efficiency in production
- Develop scalable systems that deliver reliable, high-quality generative AI experiences across customer use cases
Course description
Large language and vision models have increasingly large context windows. It’s not uncommon for models to have hundreds of thousands, if not millions, of tokens, amounting to multiple books stuffed into a single prompt. This enables a substantial increase in use cases such as personalizing GenAI for customers based on their data.
But more context requires more responsibility. While early retrieval solutions tackled the problem of feeding large context into LLMs with inherent context size limitations, now the problem is to figure out how to leverage large contexts effectively. With some help from data scientist Skanda Vivek, you’ll learn how to engineer LLMs around large contexts to fit your specific use case and deliver high-quality scaled experiences for customers.
This live event is for you because...
- You’re an AI/ML engineer looking to build GenAI applications with long-context models.
- You’re a CTO/CDO who wants to integrate LLMs in your business.
- You’re a software engineer who wants to learn more about LLMs to upskill.
- You work in a niche industry and want to apply LLMs to learn more and upskill for a future job.
- You’re a product or engineering manager who aims to scale GenAI-powered personalization.
- You’re a data scientist or NLP practitioner.
- You’re a technical architect designing high-context AI systems for enterprise use.
Prerequisites
- Access to ChatGPT (optional, to follow along with the exercises)
- Familiarity with software development in Python
- An understanding of fundamental machine learning concepts
- Familiarity with foundational LLM concepts and architectures
- Basic understanding of prompt engineering and vector-based retrieval
- Experience with Python and APIs of LLM providers (e.g., OpenAI, Anthropic, Google)
- (Optional) Prior exposure to productionizing AI systems or building GenAI applications
Recommended preparation:
- Explore Hands-On Large Language Models (book)
- Explore Prompt Engineering for Generative AI (book)
Recommended follow-up:
- Read Implementing MLOps in the Enterprise (book)
- Read Generative AI on AWS (book)
Schedule
The time frames are only estimates and may vary according to how the class is progressing.
Foundations of large-context models (60 minutes)
- Presentation: Evolution of context windows in LLMs and vision-language models; RAG and the evolution of context retrieval; the main aspects behind context engineering (instructions, RAG, short/long-term memory, user inputs, structured outputs, tools); planning and organization; using the file system as context and having a todo.md
- Group discussion: How do you use LLMs within your org/business use case? How important is providing the relevant context?
- Hands-on exercise: Given a use case, explore large context experiments with retrieval and generation
- Q&A
- Break
Designing with large contexts (60 minutes)
- Presentation: Prompt engineering for multithousand-token inputs; patterns for structured/semistructured data insertion; retrieval strategies; personalization at scale using customer context in prompts
- Group discussion: What context retrieval methods do you currently use (if any)? Which have you found most effective?
- Hands-on exercise: Explore retrieval strategies based on use case
- Q&A
- Break
Evaluations with large contexts (60 minutes)
- Presentation: Context failure modes (context poisoning, distraction, confusion, and clashing), how to detect them early on, and how to fix them; developing a representative eval set; boring but critical labeling; choosing eval metrics; automating evals through an LLM judge
- Group discussion: How do you evaluate the impacts of adding contexts to model outputs?
- Hands-on exercise: Evaluate LLM outputs with differing contexts
- Q&A
- Break
Deploying and scaling long-context GenAI (60 minutes)
- Presentation: Architecture patterns for long-context GenAI systems; scaling context injection—keeping context fresh and relevant; token budget management and chunk prioritization; effectively handling contexts in production use cases; monitoring for quality, latency, and costs; detecting context failures; optimizing the quality of retrieval
- Hands-on exercise: Design a high-level architecture for a scalable GenAI-powered support assistant, including context freshness strategy
- Q&A
Your Instructor
Skanda Vivek
Skanda Vivek is a senior data scientist at Intuit, working on generative AI. Previously, he was a senior data scientist on the risk intelligence team at OnSolve, where he developed advanced AI-based algorithms for rapidly detecting critical emergencies through big data. He has also been an assistant professor and a postdoctoral fellow at Georgia Tech. His work has been published in multiple scientific journals as well as broadcast widely by outlets such as the BBC and Forbes. Skanda is passionate about sharing knowledge and teaches data- and AI-focused courses with O’Reilly. He received his PhD in physics from Emory University.