June 2018
Intermediate to advanced
546 pages
13h 30m
English
Nowadays, almost nobody uses the vanilla PG method, as the much more stable Actor-Critic method exists, which will be the topic of the two following chapters. However, I still want to show the PG implementation, as it establishes very important concepts and metrics to check for the PG method’s performance. So, we will start with a much simpler environment of CartPole, and in the next section, will check its performance on our favorite Pong environment. The complete code for the following example is available in Chapter09/04_cartpole_pg.py.
GAMMA = 0.99 LEARNING_RATE = 0.001 ENTROPY_BETA = 0.01 BATCH_SIZE = 8 REWARD_STEPS = 10
Besides already familiar hyperparameters, we have two new ones. Entropy beta value is the scale of the entropy ...