Chapter 2. Modeling reinforcement learning problems: Markov decision processes

This chapter covers

  • String diagrams and our teaching methods
  • The PyTorch deep learning framework
  • Solving n-armed bandit problems
  • Balancing exploration versus exploitation
  • Modeling a problem as a Markov decision process (MDP)
  • Implementing a neural network to solve an advertisement selection problem

This chapter covers some of the most fundamental concepts in all of reinforcement learning, and it will be the basis for the rest of the book. But before we get into that, we want to first go over some of the recurring teaching methods we’ll employ in this book—most notably, the string diagrams we mentioned last chapter.

2.1. String diagrams and our teaching methods

Get Deep Reinforcement Learning in Action now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.