Now, it is time to implement the policy gradient algorithm with PyTorch:
- As before, import the necessary packages, create an environment instance, and obtain the dimensions of the observation and action space:
>>> import gym>>> import torch>>> env = gym.make('CartPole-v0')>>> n_state = env.observation_space.shape[0]>>> n_action = env.action_space.n
- We define the run_episode function, which simulates an episode given the input weight and returns the total reward and the gradients computed. More specifically, it does the following tasks in each step:
- Calculates the probabilities, probs, for both actions based on the current state and input weight
- Samples an action, action, based on the resulting probabilities
- Computes the ...