Skip to content
O'Reilly home
Learning Path

Reinforcement Learning in Python

Time to complete: 1h 29m

Published byInfinite Skills and O'Reilly Media, Inc.

CreatedNovember 2017

What is this learning path about, and why is it important?

One of the hottest topics in IT today is machine learning. Businesses of all sizes and spanning every industry are keen to extract valuable insights from the enormous volumes of data they continuously collect, taking advantage of an array of new resources in hardware, software, and, of course, cloud-based tools. If you’re looking for a way to amp-up your machine learning skills, look no further than reinforcement learning. Reinforcement learning is taking center stage as a way to advance your machine learning results over the long term.

In this learning path for advanced-level developers, data scientists, and data engineers, author and entrepreneur Matt Kirk introduces you to the basics of reinforcement learning through the application of a primary technique: Q-Learning. You’ll also see how to write code using the Bellman equations that results in better short-term decisions to improve your long-term results. To reinforce what you’ve learned, Matt walks you through a hands-on application of Q-Learning in which you’ll build an optimal stock trading strategy. You’ll review the code for trading stocks, evaluate your model’s performance, and extend the application using Dyna, a program to optimize your model specifically for trading. Matt also delves into another application—writing software to play games in an intelligent way, and review the related algorithms. All code examples are written in Python and are available as part of this learning path. Over the course of this learning path, you’ll apply practical techniques to get started quickly and see the results that reinforcement learning can provide.

What you’ll learn—and how you can apply it

  • Understanding and applying the Q-Learning technique
  • Using the Dyna model to optimize stock-trading models
  • Changing directions quickly, utilizing temporal difference learning, performing relevant A/B testing, and optimizing your model’s performance using the n-armed bandit algorithms
  • The exploitation/exploration spectrum, and deciphering when to try something new versus exploiting what you know

This learning path is for you because…

  • You're a data scientist or engineer looking to advance your machine learning knowledge through the application of deep reinforcement learning
  • You want to learn about the Q-Learning technique
  • You're an ecommerce developer tasked with identifying nuanced consumer patterns


  • You should be familiar with supervised and unsupervised learning methods
  • You should also be versed in Python development (3.x)