We have already discussed the pong environment before in Chapter 4, Policy Gradients. We will use the following code to create the A3C for Pong-v0 in OpenAI gym:
import multiprocessingimport threadingimport tensorflow as tfimport numpy as npimport gymimport osimport shutilimport matplotlib.pyplot as pltgame_env = 'Pong-v0'num_workers = multiprocessing.cpu_count()max_global_episodes = 100000global_network_scope = 'globalnet'global_iteration_update = 20gamma = 0.9beta = 0.0001lr_actor = 0.0001 # learning rate for actorlr_critic = 0.0001 # learning rate for criticglobal_running_rate = []global_episode = 0env = gym.make(game_env)num_actions = env.action_space.ntf.reset_default_graph()
The input state image preprocessing ...