The problem of reinforcement learning can be formulated as the MDP. The MDP is the mathematical notion to define the mutual interaction between states, actions, and rewards in the environment. It contains the following factors:
- Set of states S
- Space of actions A(s)
- Initial states
- State transition function
- Reward function
The MDP is a probabilistic process described by these factors. S contains all possible states in the process. A(s)