October 2018
Beginner
362 pages
9h 32m
English
As we mentioned previously, reinforcement learning algorithms seek to maximize their potential future reward. In deep learning languages, we call this the expected reward. At each time step, t, in the training process of a reinforcement learning algorithm, we want to maximize the return, R:

Our final reward is the summation of all of the expected rewards at each time step – we call this the cumulative reward. Mathematically, we can write the preceding equation as follows:

Theoretically, this process could go on forever; the termination ...
Read now
Unlock full access