Develop Self-Improving AI Agents with Reinforcement Learning

Intermediate to advanced

Build a prototype that can iteratively enhance its own reasoning and collaboration skills

What you’ll learn and how you can apply it

Understand reinforcement learning for LLMs
Design an evaluation-driven training loop that measures agent performance through rollouts and feedback with Agent Reinforcement Trainer
Implement reinforcement learning pipelines that enable agents to learn from rewards, verifiers, and benchmarks with ART and RULER (Relative Universal LLM-Elicited Rewards) to create self-improving workflows
Develop strategies for optimizing agent reasoning, tool use, and collaboration using Group - Relative Policy Optimization (GRPO) and Group Sequence Policy Optimization (GSPO)
Train agents to use any MCP tool server
Integrate multi-agent coordination and communication through MCP

Course description

The next generation of AI agents will not just execute instructions; they’ll learn how to get better at working together and using tools effectively to solve real-world problems. In this hands-on course with AI expert and author Nicole Koenigstein, you’ll design and train self-improving agents using reinforcement learning.

You’ll start with the fundamentals of RL for language-model agents and build an evaluation loop where each rollout produces agent trajectories and feedback. You’ll also learn about the conceptual chain from training infrastructure to group-level optimization and meta-reward synthesis to transform feedback into learning signals, enhancing your agent’s reasoning, decision-making, and tool use. Finally, you’ll learn how to teach agents to use MCP servers and how MCP can facilitate structured communication and shared context across a multi-agent system. By the end of the session, you’ll have a working prototype of a self-improving AI agent that can iteratively enhance its own reasoning and collaboration skills.

This live event is for you because...

You’re a machine learning engineer, AI researcher, or AI practitioner who wants to improve the reliability and quality of your agentic systems.
You want to add reinforcement-learning capabilities to existing multi-agent systems to improve their coordination and their combined problem-solving skillset.
You’re an AI practitioner who’s exploring open-ended agentic systems such as code generation or deep search.
You want to teach agents how to use any external tool via MCP.

Prerequisites

A Python 3.12 environment set up on your computer (best in Google Colab)
Dependencies installed from the provided GitHub repository (link to come)
An API key for OpenRouter
Intermediate-level Python programming experience (classes, functions, loops)
Experience interacting with LLMs and accessing them through an API
Familiarity with LangGraph

Recommended preparation:

Read “From LLMs to Agents: The Foundational Blueprint” and “Architectures and Patterns: Planning, Reactivity, and Multi-Agent Systems” (chapters 1 and 2 in AI Agents: The Definitive Guide)
Read “Reinforcement Learning Transformers” (chapter 7 in Transformers: The Definitive Guide)

Recommended follow-up:

Read AI Agents: The Definitive Guide (book)
Read Transformers: The Definitive Guide (book)

Schedule

The time frames are only estimates and may vary according to how the class is progressing.

Introduction to LLM reinforcement learning (60 minutes)

Presentation: Foundations of RL for LLMs—policies, rollouts, trajectories, and rewards to model LLM behavior
Demonstration: A minimal reinforcement loop showing how a simple policy learns from relative feedback
Group discussion: Why RL matters for self-improving agentic systems
Hands-on exercise: Run a notebook with a simplified Frozen Lake Game world; adjust the policy to see how that influences the trajectories and success rate
Q&A
Break

Reward modeling and relative scoring (60 minutes)

Presentation: How reward functions shape behavior; why relative scoring matters more than absolute metrics; intro to Relative Universal LLM-Elicited Rewards (RULER)
Demonstration: Relative ranking with RULER
Group discussion: Why relative ranking is more stable and generalizable for open-ended tasks
Hands-on exercise: Write a task description for your reward function to teach your AI agent a new skill
Q&A
Break

Agent trajectories quickstart: Designing rollout functions (60 minutes)

Presentation: How to design and collect rollouts with Agent Reinforcement Trainer (ART); defining agent trajectories; how messages, actions, and feedback help your agents self-improve
Demonstration: Generating multiple agent trajectories to prepare the agents for the optimization stage
Group discussion: Why relative ranking outperforms static scoring in RL for agents
Hands-on exercise: Design a training loop to teach your agent to play rock, paper, scissors, lizard, Spock
Q&A
Break

Optimization and continuous improvement (60 minutes)

Presentation: Using Group Relative Policy Optimization (GRPO) and Group Sequence Policy Optimization (GSPO) to improve your AI agent; stabilizing training processes to resolve challenges
Demonstration: Using additional histories for complex agent training scenarios
Hands-on exercise: Create a trajectory with additional history from multi-turn conversations within your multi-agent collaboration
Q&A
Break

Teaching agents to master tools automatically (60 minutes)

Presentation: Teaching your agent to use any MCP server effectively; using Relative Universal LLM-Elicited Rewards (RULER) to train agents automatically
Demonstration: Using MCP to facilitate agent2agent communication
Hands-on exercise: Add more comparisons and better tools to improve the agent's MCP server usage
Q&A

Your Instructor

Nicole Koenigstein
Nicole Koenigstein is an independent data scientist and quantitative researcher as well as an AI consultant, leading workshops and guiding companies from AI concept to deployment. Previously, she was CEO and cochief AI officer at Quantmate. Nicole is the author of the books Mathematics for Machine Learning with NLP and Python and Transformers in Action (Manning) and the forthcoming books AI Agents: The Definitive Guide and Transformers: The Definitive Guide for O’Reilly. She shares her expertise in Python, machine learning, and deep learning as a guest lecturer at various universities.

search

Skills covered

Generative AI

Reinforcement Learning
AI Agents

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills