How to do it...

Now, it is time to implement the policy gradient algorithm with PyTorch:

  1. As before, import the necessary packages, create an environment instance, and obtain the dimensions of the observation and action space:
>>> import gym>>> import torch>>> env = gym.make('CartPole-v0')>>> n_state = env.observation_space.shape[0]>>> n_action = env.action_space.n
  1. We define the run_episode function, which simulates an episode given the input weight and returns the total reward and the gradients computed. More specifically, it does the following tasks in each step:
  • Calculates the probabilities, probs, for both actions based on the current state and input weight
  • Samples an action, action, based on the resulting probabilities
  • Computes the ...

Get PyTorch 1.x Reinforcement Learning Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.