January 2018
Beginner to intermediate
284 pages
8h 35m
English
While applying DQN to different environment settings, where reward points are not on the same scale, the training becomes inefficient. For instance, in one game a positive reward leads to an addition of 100 points, versus another game, where it is only 10 points. To normalize the reward and penalty uniformly across all settings of environment, reward clipping is used. In this technique, each positive reward is clipped to +1 and each negative reward is fixed to -1. Hence this avoids large weight updates and allows the network to update its parameters smoothly.
Read now
Unlock full access