April 2018
Intermediate to advanced
334 pages
10h 18m
English
The Markov decision process, better known as MDP, is an approach in reinforcement learning to take decisions in a gridworld environment. A gridworld environment consists of states in the form of grids, such as the one in the FrozenLake-v0 environment from OpenAI gym, which we tried to examine and solve in the last chapter.
The MDP tries to capture a world in the form of a grid by dividing it into states, actions, models/transition models, and rewards. The solution to an MDP is called a policy and the objective is to find the optimal policy for that MDP task.
Thus, any reinforcement learning task composed of a set of states, actions, and rewards that follows the Markov property would be considered an MDP.
In this chapter, ...