A Markov decision process (MDP) is a mathematical framework for modeling decisions. We can use it to describe the RL problem. We'll assume that we work with a full knowledge of the environment. An MDP provides a formal definition of the properties we defined in the previous section (and adds some new ones):
-
is the finite set of all possible environment states, and st is the state at time t.
is the set of all possible actions, and at is the action at time t.
- is the dynamics of the environment (also known ...