Implementation of DDPG

This section will show you how to implement the actor-critic architecture using TensorFlow. The code structure is almost the same as the DQN implementation that was shown in the previous chapter.

The ActorNetwork is a simple MLP that takes the observation state as its input:

class ActorNetwork:        def __init__(self, input_state, output_dim, hidden_layers, activation=tf.nn.relu):                self.x = input_state        self.output_dim = output_dim        self.hidden_layers = hidden_layers        self.activation = activation                with tf.variable_scope('actor_network'):            self.output = self._build()            self.vars = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES,                                           tf.get_variable_scope().name)            def _build(self):                layer = self.x init_b = tf.constant_initializer(0.01) ...

Get Python Reinforcement Learning Projects now with the O’Reilly learning platform.

O’Reilly members experience live online training, plus books, videos, and digital content from nearly 200 publishers.