October 2019
Intermediate to advanced
366 pages
12h 4m
English
The over-estimation of the Q-values in Q-learning algorithms is a well-known problem. The cause of this is the max operator, which over-estimates the actual maximum estimated values. To comprehend this problem, let's assume that we have noisy estimates with a mean of 0 but a variance different from 0, as shown in the following illustration. Despite the fact that, asymptotically, the average value is 0, the max function will always return values greater than 0:

In Q-learning, this over-estimation is not a real problem until the higher values are uniformly ...
Read now
Unlock full access