RL as a Markov decision process

A Markov decision process (MDP) is a mathematical framework for modeling decisions. We can use it to describe the RL problem. We'll assume that we work with a full knowledge of the environment. An MDP provides a formal definition of the properties we defined in the previous section (and adds some new ones):

  • is the finite set of all possible environment states, and st is the state at time t.
  • is the set of all possible actions, and at is the action at time t.
  • is the dynamics of the environment (also known ...

Get Python Deep Learning - Second Edition now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.