- We import the necessary modules. We are using the sys module's stdout.flush() to help us force Python to flush the data in the standard output (computer monitor in our case). The random module is used to derive random samples from the experience replay buffer (the buffer where we store the past experience). The datetime module is used to keep track of the time spent in training:
import gymimport sysimport randomimport numpy as npimport tensorflow as tfimport matplotlib.pyplot as pltfrom datetime import datetimefrom scipy.misc import imresize
- We define the hyperparameters for the training; you can experiment by changing them. These parameters define the minimum and maximum size of the experience replay buffer and the number ...