January 2020
Intermediate to advanced
432 pages
10h 18m
English
Fundamental to the DL methods is the need for us to feed batches of observed agent events into the neural network. Remember, we do this in batches so the algorithm is able to average across errors or loss better. This requirement is more a function of DL than anything to do with RL. As such, we want to store a previous number of the observed state, action, next state, reward, and returns from our agent, taking an action into a container called ReplayBuffer. We then randomly sample those events from the replay buffer and inject them into the neural network for training. Let's see how the buffer is constructed again by reopening sample Chapter_6_DQN.py and following this exercise:
Read now
Unlock full access