Let's go ahead and implement the hill-climbing algorithm with PyTorch:
- As before, import the necessary packages, create an environment instance, and obtain the dimensions of the observation and action space:
>>> import gym>>> import torch>>> env = gym.make('CartPole-v0')>>> n_state = env.observation_space.shape[0]>>> n_action = env.action_space.n
- We will reuse the run_episode function we defined in the previous recipe, so we will not repeat it here. Again, given the input weight, it simulates an episode and returns the total reward.
-
Let's make it 1,000 episodes for now:
>>> n_episode = 1000
- We need to keep track of the best total reward on the fly, as well as the corresponding weight. So, let's specify their starting ...