This file contains our implementation of PolicyValueNetwork. In short, we construct a tf.estimator.Estimator that is trained using board states, policies, and self-play outcomes produced by MCTS self-play. The network has two heads: one acting as a value function, and the other acting as a policy network.
First, we define some layers that will be used by PolicyValueNetwork:
import functoolsimport loggingimport os.pathimport tensorflow as tfimport featuresimport preprocessingimport utilsfrom config import GLOBAL_PARAMETER_STORE, GOPARAMETERSfrom constants import *logger = logging.getLogger(__name__)logger.setLevel(logging.INFO)def create_partial_bn_layer(params): return functools.partial(tf.layers.batch_normalization, momentum