The controller.py module is where everything comes together. We will implement the Controller, which handles training each child network as well as its own parameter updates. We first implement a helper function that calculates an exponential moving average of a list of numbers. We use this as the baseline function for our REINFORCE gradient calculation, as mentioned previously, to calculate the exponential moving average of the past rewards:
import loggingimport numpy as npimport tensorflow as tffrom child_network import ChildCNNfrom cifar10_processor import get_tf_datasets_from_numpyfrom config import child_network_params, controller_paramslogger = logging.getLogger(__name__)def ema(values): """ Helper function for keeping ...