December 2018
Beginner to intermediate
684 pages
21h 9m
English
Finite MDPs are a simple yet fundamental framework. We introduce the trajectories of rewards that the agent aims to optimize, and define the policy and value functions they are used to formulate the optimization problem and the Bellman equations that form the basis for the solution methods.