As discussed in Chapter 1, reinforcement learning involves sequential decision-making. In this chapter, we will formalize the notion of using stochastic processes under the branch of probability that models sequential decision-making behavior. While most of the problems we study in reinforcement learning are modeled as Markov decision processes (MDP), we start by first introducing Markov chains (MC) followed by Markov reward processes (MRP). We finish up by discussing MDP in-depth while covering model setup and the assumptions behind MDP.
We then discuss related concepts ...