We solve the CartPole environment using the REINFORCE with baseline algorithm as follows:
- Import all the necessary packages and create a CartPole instance:
>>> import gym>>> import torch>>> import torch.nn as nn>>> from torch.autograd import Variable>>> env = gym.make('CartPole-v0')
- For the policy network part, it is basically the same as the PolicyNetwork class we used in the Implementing the REINFORCE algorithm recipe. Keep in mind that the advantage values are used in the update method:
>>> def update(self, advantages, log_probs): ... """ ... Update the weights of the policy network given the training samples ... @param advantages: advantage for each step in an episode ... @param log_probs: log probability for each ...