M. HuThe Art of Reinforcement Learninghttps://doi.org/10.1007/978-1-4842-9606-6_3

3. Dynamic Programming

Michael Hu¹

(1)

Shanghai, Shanghai, China

The ultimate goal of modeling a reinforcement learning problem using Markov decision processes (MDPs) is that we can use the Bellman equations to find an optimal policy $\pi _*$ that maximizes the expected cumulative reward. However, finding such a policy is not always straightforward. In this chapter, we’ll introduce dynamic programming (DP) algorithms as a way to find the optimal policy when we have access to a perfect ...

Get The Art of Reinforcement Learning: Fundamentals, Mathematics, and Implementations with Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

The Art of Reinforcement Learning: Fundamentals, Mathematics, and Implementations with Python by Michael Hu

3. Dynamic Programming

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly