Pong using policy gradients

Let's try to use policy gradients to play a game of Pong. The Andrej Karpathy blog post, at http://karpathy.github.io/2016/05/31/rl/ inspires the implementation here. Recall that, in Breakout, we used four-game frames stacked together as input so that the game dynamics are known to the agent; here, we use the difference between two consecutive game frames as the input to the network. Hence, our agent has information about the present state and the previous state with it:

  1. The first step, as always, is importing the modules necessary. We import TensorFlow, Numpy, Matplotlib, and gym for the environment:
import numpy as npimport gymimport matplotlib.pyplot as pltimport tensorflow as tffrom gym import wrappers%matplotlib ...

Get Hands-On Artificial Intelligence for IoT now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.