November 2024
Intermediate to advanced
716 pages
19h 34m
English
In this first chapter of Part 3 of the book, we will consider an alternative way to handle Markov decision process (MDP) problems, which form a full family of methods called policy gradient methods. In some situations, these methods work better than value-based methods, so it is really important to be familiar with them.
In this chapter, we will:
Cover an overview of the methods, their motivations, and their strengths and weaknesses in comparison to the already familiar Q-learning
Start with a simple policy gradient method called REINFORCE and try to apply it to our CartPole environment, comparing it with the deep Q-network (DQN) approach
Discuss problems with the vanilla REINFORCE method and ways to address them ...
Read now
Unlock full access