ES on CartPole
The complete example is in Chapter16/01_cartpole_es.py
. In this example, we use the single environment to check the fitness of the perturbed network weights. Our fitness function will be the undiscounted total reward for the episode:
#!/usr/bin/env python3 import gym import time import numpy as np import torch import torch.nn as nn from tensorboardX import SummaryWriter
From the import
statements, you can notice how self-contained our example is. We're not using PyTorch optimizers, as we do not perform backpropagation at all. In fact, we could avoid using PyTorch completely and work only with NumPy, as the only thing we use PyTorch for is to perform a forward pass and calculate the network's output.
MAX_BATCH_EPISODES = 100 MAX_BATCH_STEPS ...
Get Deep Reinforcement Learning Hands-On now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.