© The Author(s), under exclusive license to APress Media, LLC, part of Springer Nature 2023
M. HuThe Art of Reinforcement Learninghttps://doi.org/10.1007/978-1-4842-9606-6_3

3. Dynamic Programming

Michael Hu1  
(1)
Shanghai, Shanghai, China
 

The ultimate goal of modeling a reinforcement learning problem using Markov decision processes (MDPs) is that we can use the Bellman equations to find an optimal policy $$\pi _*$$ that maximizes the expected cumulative reward. However, finding such a policy is not always straightforward. In this chapter, we’ll introduce dynamic programming (DP) algorithms as a way to find the optimal policy when we have access to a perfect ...

Get The Art of Reinforcement Learning: Fundamentals, Mathematics, and Implementations with Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.