We will now look at how to implement A3C using Python and TensorFlow. Here, the policy network and value network share the same feature representation. We implement two kinds of policies: one is based on the CNN architecture used in DQN, and the other is based on LSTM.
We implement the FFPolicy class for the policy based on CNN:
class FFPolicy: def __init__(self, input_shape=(84, 84, 4), n_outputs=4, network_type='cnn'): self.width = input_shape[0] self.height = input_shape[1] self.channel = input_shape[2] self.n_outputs = n_outputs self.network_type = network_type self.entropy_beta = 0.01 self.x = tf.placeholder(dtype=tf.float32, shape=(None, self.channel, self.width, self.height)) self.build_model()
The constructor ...