A Markov decision process (MDP) is a mathematical framework for modeling decisions. We can use it to describe the RL problem. We'll assume that we work with a full knowledge of the environment. An MDP provides a formal definition of the properties we defined in the previous section (and adds some new ones):
- is the finite set of all possible environment states, and st is the state at time t.
- is the set of all possible actions, and at is the action at time t.
- is the dynamics of the environment (also known ...