O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Reinforcement Learning in Motion

Video Description

A neat introduction to dive into Deep Reinforcement Learning.
Sandeep Chigurupati

Reinforcement Learning in Motion introduces you to the exciting world of machine systems that learn from their environments! Developer, data scientist, and expert instructor Phil Tabor guides you from the basics all the way to programming your own constantly-learning AI agents. In this course, he’ll break down key concepts like how RL systems learn, how to sense and process environmental data, and how to build and train AI agents. As you learn, you’ll master the core algorithms and get to grips with tools like Open AI Gym, numpy, and Matplotlib.

Reinforcement systems learn by doing, and so will you in this hands-on course! You’ll build and train a variety of algorithms as you go, each with a specific purpose in mind. The rich and interesting examples include simulations that train a robot to escape a maze, help a mountain car get up a steep hill, and balance a pole on a sliding cart. You’ll even teach your agents how to navigate Windy Gridworld, a standard exercise for finding the optimal path even with special conditions!


With reinforcement learning, an AI agent learns from its environment, constantly responding to the feedback it gets. The agent optimizes its behavior to avoid negative consequences and enhance positive outcomes. The resulting algorithms are always looking for the most positive and efficient outcomes!

Importantly, with reinforcement learning you don’t need a mountain of data to get started. You just let your AI agent poke and prod its environment, which makes it much easier to take on novel research projects without well-defined training datasets.
Inside:

  • What is a reinforcement learning agent?
  • An introduction to the Open AI Gym
  • Identifying appropriate algorithms
  • Implementing RL algorithms using Numpy
  • Visualizing performance with Matplotlib
You’ll need to be familiar with Python and machine learning basics. Examples use Python libraries like NumPy and Matplotlib. You'll also need some understanding of linear algebra and calculus, please see the equations in the Free Downloads section for examples.

Phil Tabor is a lifelong coder with a passion for simplifying and teaching complex topics. A physics PhD and former Intel process engineer, he works as a data scientist, teaches machine learning on YouTube, and contributes to Sensenet, an open source project using deep reinforcement learning to teach robots to identify objects by touch.

After watching the first few sections you'll be able to experiment with some simple algorithms and definitely want to continue learning more.
Rob Pacheco

Gives a fantastic look into the examples and mathematical background.
Harald Kuhn

It prepares you to apply reinforcement learning directly to a problem you have in hand!
Yaser Marey

Table of Contents

  1. INTRODUCTION TO REINFORCEMENT LEARNING
    1. Course introduction 00:05:01
    2. Getting Acquainted with Machine Learning 00:09:26
    3. How Reinforcement Learning Fits In 00:05:26
    4. Required software 00:03:10
  2. KEY CONCEPTS
    1. Understanding the agent 00:05:04
    2. Defining the environment 00:05:52
    3. Designing the reward 00:04:24
    4. How the agent learns 00:09:59
    5. Choosing actions 00:07:15
    6. Coding the environment 00:06:23
    7. Finishing the maze-running robot problem 00:05:00
  3. BEATING THE CASINO: THE EXPLORE-EXPLOIT DILEMMA
    1. Introducing the multi-armed bandit problem 00:03:47
    2. Action-value methods 00:06:43
    3. Coding the multi-armed bandit test bed 00:06:55
    4. Moving the goal posts: nonstationary problems 00:07:08
    5. Optimistic initial values and upper confidence bound action selection 00:11:51
    6. Wrapping up the explore-exploit dilemma 00:04:51
  4. SKATING THE FROZEN LAKE: MARKOV DECISION PROCESSES
    1. Introducing Markov decision processes and the frozen lake environment 00:09:21
    2. Even robots have goals 00:06:45
    3. Handling uncertainty with policies and value functions 00:08:37
    4. Achieving mastery: Optimal policies and value functions 00:07:30
    5. Skating off the frozen lake 00:05:29
  5. NAVIGATING GRIDWORLD WITH DYNAMIC PROGRAMMING
    1. Crash-landing on planet Gridworld 00:09:42
    2. Let's make a plan: Policy evaluation in Gridworld 00:08:18
    3. The best laid plans: Policy improvement in the Gridworld 00:03:57
    4. Hastening our escape with policy iteration 00:04:57
    5. Creating a backup plan with value iteration 00:06:09
    6. Wrapping up dynamic programming 00:04:08
  6. NAVIGATING THE WINDY GRIDWORLD WITH MONTE CARLO METHODS
    1. The windy gridworld problem 00:05:33
    2. Monte who? 00:07:12
    3. No substitute for action: Policy evaluation with Monte Carlo methods 00:03:53
    4. Monte Carlo control and exploring starts 00:07:43
    5. Monte Carlo control without exploring starts 00:06:15
    6. Off-policy Monte Carlo methods 00:12:06
    7. Return to the frozen lake and wrapping up Monte Carlo methods 00:06:17
  7. BALANCING THE CART POLE: TEMPORAL DIFFERENCE LEARNING
    1. The cart pole problem 00:04:57
    2. TD(0) prediction 00:09:19
    3. On-policy TD control: SARSA 00:07:34
    4. Off-policy TD control: Q learning 00:05:13
    5. Back to school with double learning 00:09:06
    6. Wrapping up temporal difference learning 00:05:43
  8. CLIMBING THE MOUNTAIN WITH APPROXIMATION METHODS
    1. The continuous mountain car problem 00:04:31
    2. Why approximation methods? 00:05:47
    3. Stochastic gradient descent: The intuition 00:04:05
    4. Stochastic gradient descent: The mathematics 00:05:18
    5. Approximate Monte Carlo predictions 00:08:43
    6. Linear methods and tiling 00:10:54
    7. TD(0) semi-gradient prediction 00:07:36
    8. Episodic semi-gradient control: SARSA 00:08:52
    9. Over the hill: wrapping up approximation methods and the mountain car problem 00:06:10
  9. SUMMARY
    1. Course recap 00:10:11
    2. The frontiers of reinforcement learning 00:06:31
    3. What to do next 00:04:05