How to do it...

Let's go ahead and implement the hill-climbing algorithm with PyTorch:

  1. As before, import the necessary packages, create an environment instance, and obtain the dimensions of the observation and action space:
>>> import gym>>> import torch>>> env = gym.make('CartPole-v0')>>> n_state = env.observation_space.shape[0]>>> n_action = env.action_space.n
  1. We will reuse the run_episode function we defined in the previous recipe, so we will not repeat it here. Again, given the input weight, it simulates an episode and returns the total reward.
  2. Let's make it 1,000 episodes for now:

>>> n_episode = 1000
  1. We need to keep track of the best total reward on the fly, as well as the corresponding weight. So, let's specify their starting ...

Get PyTorch 1.x Reinforcement Learning Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.