Dynamic programming
DP is a general algorithmic paradigm that breaks up a problem into smaller chunks of overlapping subproblems, and then finds the solution to the original problem by combining the solutions of the subproblems.
DP can be used in reinforcement learning and is among one of the simplest approaches. It is suited to computing optimal policies by being provided with a perfect model of the environment.
DP is an important stepping stone in the history of RL algorithms and provides the foundation for the next generation of algorithms, but it is computationally very expensive. DP works with MDPs with a limited number of states and actions as it has to update the value of each state (or action-value), taking into consideration all ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access