Let's go ahead and implement the hill-climbing algorithm with PyTorch:
- As before, import the necessary packages, create an environment instance, and obtain the dimensions of the observation and action space:
>>> import gym>>> import torch>>> env = gym.make('CartPole-v0')>>> n_state = env.observation_space.shape[0]>>> n_action = env.action_space.n
- We will reuse the run_episode function we defined in the previous recipe, so we will not repeat it here. Again, given the input weight, it simulates an episode and returns the total reward.
Let's make it 1,000 episodes for now:
>>> n_episode = 1000
- We need to keep track of the best total reward on the fly, as well as the corresponding weight. So, let's specify their starting ...