The module is where everything comes together. We will implement the Controller, which handles training each child network as well as its own parameter updates. We first implement a helper function that calculates an exponential moving average of a list of numbers. We use this as the baseline function for our REINFORCE gradient calculation, as mentioned previously, to calculate the exponential moving average of the past rewards:

import loggingimport numpy as npimport tensorflow as tffrom child_network import ChildCNNfrom cifar10_processor import get_tf_datasets_from_numpyfrom config import child_network_params, controller_paramslogger = logging.getLogger(__name__)def ema(values):    """ Helper function for keeping ...

Get Python Reinforcement Learning Projects now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.