This file contains our implementation of PolicyValueNetwork. In short, we construct a tf.estimator.Estimator that is trained using board states, policies, and self-play outcomes produced by MCTS self-play. The network has two heads: one acting as a value function, and the other acting as a policy network.

First, we define some layers that will be used by PolicyValueNetwork:

import functoolsimport loggingimport os.pathimport tensorflow as tfimport featuresimport preprocessingimport utilsfrom config import GLOBAL_PARAMETER_STORE, GOPARAMETERSfrom constants import *logger = logging.getLogger(__name__)logger.setLevel(logging.INFO)def create_partial_bn_layer(params):    return functools.partial(tf.layers.batch_normalization,        momentum

Get Python Reinforcement Learning Projects now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.