This section will show you how to implement the actor-critic architecture using TensorFlow. The code structure is almost the same as the DQN implementation that was shown in the previous chapter.
The ActorNetwork is a simple MLP that takes the observation state as its input:
class ActorNetwork: def __init__(self, input_state, output_dim, hidden_layers, activation=tf.nn.relu): self.x = input_state self.output_dim = output_dim self.hidden_layers = hidden_layers self.activation = activation with tf.variable_scope('actor_network'): self.output = self._build() self.vars = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, tf.get_variable_scope().name) def _build(self): layer = self.x init_b = tf.constant_initializer(0.01) ...