June 2018
Intermediate to advanced
546 pages
13h 30m
English
The complete example is in Chapter16/01_cartpole_es.py. In this example, we use the single environment to check the fitness of the perturbed network weights. Our fitness function will be the undiscounted total reward for the episode:
#!/usr/bin/env python3 import gym import time import numpy as np import torch import torch.nn as nn from tensorboardX import SummaryWriter
From the import statements, you can notice how self-contained our example is. We're not using PyTorch optimizers, as we do not perform backpropagation at all. In fact, we could avoid using PyTorch completely and work only with NumPy, as the only thing we use PyTorch for is to perform a forward pass and calculate the network's output.
MAX_BATCH_EPISODES = 100 MAX_BATCH_STEPS ...