April 2017
Intermediate to advanced
318 pages
7h 40m
English
As an agent, our objective is to maximize the total reward from each game. The total reward can be represented as follows:

In order to maximize the total reward, the agent should try to maximize the total reward from any time point t in the game. The total reward at time step t is given by Rt and is represented as:

However, it is harder to predict the value of the rewards the further we go into the future. In order to take this into consideration, our agent should try to maximize the total discounted future reward ...