PG on CartPole
Nowadays, almost nobody uses the vanilla PG method, as the much more stable Actor-Critic method exists, which will be the topic of the two following chapters. However, I still want to show the PG implementation, as it establishes very important concepts and metrics to check for the PG method’s performance. So, we will start with a much simpler environment of CartPole, and in the next section, will check its performance on our favorite Pong environment. The complete code for the following example is available in Chapter09/04_cartpole_pg.py
.
GAMMA = 0.99 LEARNING_RATE = 0.001 ENTROPY_BETA = 0.01 BATCH_SIZE = 8 REWARD_STEPS = 10
Besides already familiar hyperparameters, we have two new ones. Entropy beta value is the scale of the entropy ...
Get Deep Reinforcement Learning Hands-On now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.