October 2018
Intermediate to advanced
252 pages
6h 49m
English
In games, a reward is directly proportional to the score of the game. In the case of the CartPole game, if the pole is tilted right, the future reward of pushing the button toward the right will be higher than pushing it to the left; the pole will be vertical for longer. To logically represent this intuition and train it, it has to be expressed as a formula which has to be optimized. Loss is the difference between the prediction and the actual target.
The loss formula for CartPole can be shown as the following:

Where: