January 2019
Intermediate to advanced
386 pages
11h 13m
English
Dynamic Programming (DP) is a base for many RL algorithms. The main paradigm of DP algorithms is to use the state- and action-value functions as tools to find the optimal policy, given a fully-known model of the environment. In this section, we'll see how to do that.