So far, we've seen how using a replay buffer or experience replay mechanism allows us to pull values back in batches at a later time in order to train the network graph. These batches of data were composed of random samples, which works well, but of course, we can do better. Therefore, instead of storing just everything, we can make two decisions: what data to store and what data is a priority to use. In order to simplify things, we will just look at prioritizing what data we extract from the experience replay. By prioritizing the data we extract, we can hope this will dramatically improve the information we do feed to the network for learning and thus the whole performance of the agent. ...
Extending replay with prioritized experience replay
Get Hands-On Reinforcement Learning for Games now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.