Now that we have a basic understanding of MRPs, we can move on to MDPs. An MDP is an MRP which also involved decisions. All the states in the environment are also Markov, hence the next state is only dependent on the current state. Formally, an MDP can be represented using where S is the state space, A is the action set, P is the state transition probability function, R is the reward function, and is the discount rate. The state transition probability function P and the reward function R are formally defined as:
We can also formally define ...