A3C for Pong-v0 in OpenAI gym

We have already discussed the pong environment before in Chapter 4, Policy Gradients. We will use the following code to create the A3C for Pong-v0 in OpenAI gym:

import multiprocessingimport threadingimport tensorflow as tfimport numpy as npimport gymimport osimport shutilimport matplotlib.pyplot as pltgame_env = 'Pong-v0'num_workers = multiprocessing.cpu_count()max_global_episodes = 100000global_network_scope = 'globalnet'global_iteration_update = 20gamma = 0.9beta = 0.0001lr_actor = 0.0001 # learning rate for actorlr_critic = 0.0001 # learning rate for criticglobal_running_rate = []global_episode = 0env = gym.make(game_env)num_actions = env.action_space.ntf.reset_default_graph()

The input state image preprocessing ...

Get Reinforcement Learning with TensorFlow now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.