April 2018
Intermediate to advanced
334 pages
10h 18m
English
In reinforcement learning, we want the Q-function Q(s,a) to predict the best action for a state s in order to maximize the future reward. The Q-function is estimated using Q-learning, which involves the process of updating the Q-function using Bellman equations through a series of iterations as follows:

Here:
Q(s,a) = Q value for the current state s and action a pair
= learning rate of convergence
= discounting factor of future rewards ...