The ultimate goal of modeling a reinforcement learning problem using Markov decision processes (MDPs) is that we can use the Bellman equations to find an optimal policy that maximizes the expected cumulative reward. However, finding such a policy is not always straightforward. In this chapter, we’ll introduce dynamic programming (DP) algorithms as a way to find the optimal policy when we have access to a perfect ...
3. Dynamic Programming
Get The Art of Reinforcement Learning: Fundamentals, Mathematics, and Implementations with Python now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.