Let's try to use policy gradients to play a game of Pong. The Andrej Karpathy blog post, at http://karpathy.github.io/2016/05/31/rl/ inspires the implementation here. Recall that, in Breakout, we used four-game frames stacked together as input so that the game dynamics are known to the agent; here, we use the difference between two consecutive game frames as the input to the network. Hence, our agent has information about the present state and the previous state with it:
- The first step, as always, is importing the modules necessary. We import TensorFlow, Numpy, Matplotlib, and gym for the environment:
import numpy as npimport gymimport matplotlib.pyplot as pltimport tensorflow as tffrom gym import wrappers%matplotlib ...