Summary
In this chapter, we took an in-depth look at DP and the Bellman equation. The Bellman equation with DP has influenced RL significantly by introducing the concept of future rewards and optimization. We covered the contribution of Bellman in this chapter by first taking a deep look at DP and how to solve a problem dynamically. Then, we advanced to understanding the Bellman optimality equation and how it can be used to account for future rewards as well as determine expected state and action values using iterative methods. In particular, we focused on the implementation in Python of policy iteration and improvement. Then, from there, we looked at value iteration. Finally, we concluded this chapter by setting up an agent test against ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access