Develop Self-Improving AI Agents with Reinforcement Learning
Published by O'Reilly Media, Inc.
Build a prototype that can iteratively enhance its own reasoning and collaboration skills
What you’ll learn and how you can apply it
- Understand reinforcement learning for LLMs
- Design an evaluation-driven training loop that measures agent performance through rollouts and feedback with Agent Reinforcement Trainer
- Implement reinforcement learning pipelines that enable agents to learn from rewards, verifiers, and benchmarks with ART and RULER (Relative Universal LLM-Elicited Rewards) to create self-improving workflows
- Develop strategies for optimizing agent reasoning, tool use, and collaboration using Group - Relative Policy Optimization (GRPO) and Group Sequence Policy Optimization (GSPO)
- Train agents to use any MCP tool server
- Integrate multi-agent coordination and communication through MCP
Course description
The next generation of AI agents will not just execute instructions; they’ll learn how to get better at working together and using tools effectively to solve real-world problems. In this hands-on course with AI expert and author Nicole Koenigstein, you’ll design and train self-improving agents using reinforcement learning.
You’ll start with the fundamentals of RL for language-model agents and build an evaluation loop where each rollout produces agent trajectories and feedback. You’ll also learn about the conceptual chain from training infrastructure to group-level optimization and meta-reward synthesis to transform feedback into learning signals, enhancing your agent’s reasoning, decision-making, and tool use. Finally, you’ll learn how to teach agents to use MCP servers and how MCP can facilitate structured communication and shared context across a multi-agent system. By the end of the session, you’ll have a working prototype of a self-improving AI agent that can iteratively enhance its own reasoning and collaboration skills.
This live event is for you because...
- You’re a machine learning engineer, AI researcher, or AI practitioner who wants to improve the reliability and quality of your agentic systems.
- You want to add reinforcement-learning capabilities to existing multi-agent systems to improve their coordination and their combined problem-solving skillset.
- You’re an AI practitioner who’s exploring open-ended agentic systems such as code generation or deep search.
- You want to teach agents how to use any external tool via MCP.
Prerequisites
- A Python 3.12 environment set up on your computer (best in Google Colab)
- Dependencies installed from the provided GitHub repository (link to come)
- An API key for OpenRouter
- Intermediate-level Python programming experience (classes, functions, loops)
- Experience interacting with LLMs and accessing them through an API
- Familiarity with LangGraph
Recommended preparation:
- Read “From LLMs to Agents: The Foundational Blueprint” and “Architectures and Patterns: Planning, Reactivity, and Multi-Agent Systems” (chapters 1 and 2 in AI Agents: The Definitive Guide)
- Read “Reinforcement Learning Transformers” (chapter 7 in Transformers: The Definitive Guide)
Recommended follow-up:
- Read AI Agents: The Definitive Guide (book)
- Read Transformers: The Definitive Guide (book)
Schedule
The time frames are only estimates and may vary according to how the class is progressing.
Introduction to LLM reinforcement learning (60 minutes)
- Presentation: Foundations of RL for LLMs—policies, rollouts, trajectories, and rewards to model LLM behavior
- Demonstration: A minimal reinforcement loop showing how a simple policy learns from relative feedback
- Group discussion: Why RL matters for self-improving agentic systems
- Hands-on exercise: Run a notebook with a simplified Frozen Lake Game world; adjust the policy to see how that influences the trajectories and success rate
- Q&A
- Break
Reward modeling and relative scoring (60 minutes)
- Presentation: How reward functions shape behavior; why relative scoring matters more than absolute metrics; intro to Relative Universal LLM-Elicited Rewards (RULER)
- Demonstration: Relative ranking with RULER
- Group discussion: Why relative ranking is more stable and generalizable for open-ended tasks
- Hands-on exercise: Write a task description for your reward function to teach your AI agent a new skill
- Q&A
- Break
Agent trajectories quickstart: Designing rollout functions (60 minutes)
- Presentation: How to design and collect rollouts with Agent Reinforcement Trainer (ART); defining agent trajectories; how messages, actions, and feedback help your agents self-improve
- Demonstration: Generating multiple agent trajectories to prepare the agents for the optimization stage
- Group discussion: Why relative ranking outperforms static scoring in RL for agents
- Hands-on exercise: Design a training loop to teach your agent to play rock, paper, scissors, lizard, Spock
- Q&A
- Break
Optimization and continuous improvement (60 minutes)
- Presentation: Using Group Relative Policy Optimization (GRPO) and Group Sequence Policy Optimization (GSPO) to improve your AI agent; stabilizing training processes to resolve challenges
- Demonstration: Using additional histories for complex agent training scenarios
- Hands-on exercise: Create a trajectory with additional history from multi-turn conversations within your multi-agent collaboration
- Q&A
- Break
Teaching agents to master tools automatically (60 minutes)
- Presentation: Teaching your agent to use any MCP server effectively; using Relative Universal LLM-Elicited Rewards (RULER) to train agents automatically
- Demonstration: Using MCP to facilitate agent2agent communication
- Hands-on exercise: Add more comparisons and better tools to improve the agent's MCP server usage
- Q&A
Your Instructor
Nicole Koenigstein
Nicole Koenigstein is an independent data scientist and quantitative researcher as well as an AI consultant, leading workshops and guiding companies from AI concept to deployment. Previously, she was CEO and cochief AI officer at Quantmate. Nicole is the author of the books Mathematics for Machine Learning with NLP and Python and Transformers in Action (Manning) and the forthcoming books AI Agents: The Definitive Guide and Transformers: The Definitive Guide for O’Reilly. She shares her expertise in Python, machine learning, and deep learning as a guest lecturer at various universities.
Skills covered
- Generative AI
- Reinforcement Learning
- AI Agents