October 2019
Intermediate to advanced
366 pages
12h 4m
English
The idea of Q-learning is to approximate the Q-function by using the current optimal action value. The Q-learning update is very similar to the update done in SARSA, with the exception that it takes the maximum state-action value:

is the usual learning rate and
is the discount factor.
While the SARSA update is done on the behavior policy (like a -greedy policy), the Q-update is done on the greedy target policy that results from the maximum ...
Read now
Unlock full access