O'Reilly logo

Deep Reinforcement Learning Hands-On by Maxim Lapan

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

PG on CartPole

Nowadays, almost nobody uses the vanilla PG method, as the much more stable Actor-Critic method exists, which will be the topic of the two following chapters. However, I still want to show the PG implementation, as it establishes very important concepts and metrics to check for the PG method’s performance. So, we will start with a much simpler environment of CartPole, and in the next section, will check its performance on our favorite Pong environment. The complete code for the following example is available in Chapter09/04_cartpole_pg.py.

GAMMA = 0.99
LEARNING_RATE = 0.001
ENTROPY_BETA = 0.01
BATCH_SIZE = 8
REWARD_STEPS = 10

Besides already familiar hyperparameters, we have two new ones. Entropy beta value is the scale of the entropy ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required