September 2018
Intermediate to advanced
288 pages
7h 38m
English
Dynamic Programming (DP) represents a set of algorithms that can be used to calculate an optimal policy given a perfect model of the environment in the form of an MDP. The fundamental idea of DP, as well as reinforcement learning in general, is the use of state values and actions to look for good policies.
The DP methods approach the resolution of MDP processes through the iteration of two processes called policy evaluation and policy improvement:
Read now
Unlock full access