Markov decision processes (MDPs) offer a powerful framework for tackling sequential decision-making problems in reinforcement learning. Their applications span various domains, including robotics, finance, and optimal control.
In this chapter, we provide an overview of the key components of Markov decision processes (MDPs) and demonstrate the formulation of a basic reinforcement learning problem using the MDP framework. We delve into the concepts of policy and value functions, examine the Bellman equations, ...