Markov decision process and the Bellman equation

At the heart of RL is the Markov decision process (MDP). An MDP is often described as a discrete time stochastic control process. In simpler terms, this just means it is a control program that functions by time steps to determine the probability of actions, provided each action leads to a reward. This process is already used for most automation control of robotics, drones, networking, and of course RL. The classic way we picture this process is shown in the following diagram:

The Markov decision process

Where represent an MDP as a tuple or vector , using the following variables:

  • being a ...

Get Hands-On Deep Learning for Games now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.