Chapter 4: Makings of a Markov Decision Process

In the first chapter, we talked about many applications of Reinforcement Learning (RL), from robotics to finance. Before implementing any RL algorithms for such applications, we need to first model them mathematically. Markov Decision Process (MDP) is the framework we use to model such sequential decision-making problems. MDPs have some special characteristics that make it easier for us to theoretically analyze them. Building on that theory, Dynamic Programming (DP) is the field that proposes solution methods for MDPs. RL, in some sense, is a collection of approximate DP approaches, which enable us to obtain good (but not necessarily optimal) solutions to very complex problems that are intractable ...

Get Mastering Reinforcement Learning with Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.