Implementing REINFORCE

It's time to implement REINFORCE. Here, we provide a mere implementation of the algorithm, without the procedures for its debugging and monitoring. The complete implementation is available in the GitHub repository. So, make sure that you check it out.

The code is divided into three main functions, and one class:

  • REINFORCE(env_name, hidden_sizes, lr, num_epochs, gamma, steps_per_epoch): This is the function that contains the main implementation of the algorithm.
  • Buffer: This is a class that is used to temporarily store the trajectories.
  • mlp(x, hidden_layer, output_size, activation, last_activation): This is used to build a multi-layer perceptron in TensorFlow.
  • discounted_rewards(rews, gamma): This computes the discounted ...

Get Reinforcement Learning Algorithms with Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.