Prioritized experience delay
In the previous section, we learnt the importance of experience delay in stabilizing Q-learning by de-correlating the input sequential data. In experience delay, we sample an event from an experience buffer using a uniform distribution. This has the effect of treating each historical event as the same in terms of its priority. However, in practice this is not true. There are certain events that are more likely to augment the learning process than others.
One way to find such events is to look for events that do not fit with the current estimates of the Q-value. By selecting and feeding such events into the learning process, you can augment the learning capacity of the network. This can be understood intuitively; ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access