Chapter 5: Solving the Reinforcement Learning Problem

In the previous chapter we provided the mathematical foundations for modeling a reinforcement learning problem. In this chapter, we lay the foundation for solving it. Many of the following chapters will focus on some specific solution approaches that will rise on this foundation. To this end, we first cover the dynamic programming (DP) approach, with which we introduce some key ideas and concepts. DP methods provide optimal solutions to Markov decision processes (MDPs), yet they require the complete knowledge and a compact representation of the state transition and reward dynamics of the environment. This could be severely limiting and impractical in a realistic scenario, where the agent ...

Get Mastering Reinforcement Learning with Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.