June 2018
Intermediate to advanced
546 pages
13h 30m
English
The source code is in Chapter16/03_cartpole_ga.py and it has lots in common with our ES example. The difference is in the lack of the gradient ascent code, which was replaced by the network mutation function as follows:
def mutate_parent(net):
new_net = copy.deepcopy(net)
for p in new_net.parameters():
noise_t = torch.from_numpy(np.random.normal(size=p.data.size()).astype(np.float32))
p.data += NOISE_STD * noise_t
return new_netThe goal of the function is to create a mutated copy of the given policy by adding a random noise to all weights. The parent's weights are kept untouched, as a random selection of the parent is performed with replacement, so this network could be used again later.
NOISE_STD = 0.01 POPULATION_SIZE = 50 PARENTS_COUNT ...
Read now
Unlock full access