October 2019
Intermediate to advanced
366 pages
12h 4m
English
The value function that baseline approximated with a neural network can be implemented by adding a few lines to our previous code:
... # placeholder that will contain the reward to go values (i.e. the y values) rtg_ph = tf.placeholder(shape=(None,), dtype=tf.float32, name='rtg') # MLP value function s_values = tf.squeeze(mlp(obs_ph, hidden_sizes, 1, activation=tf.tanh)) # MSE loss function v_loss = tf.reduce_mean((rtg_ph - s_values)**2) # value function optimization v_opt = tf.train.AdamOptimizer(vf_lr).minimize(v_loss) ...
Read now
Unlock full access