Getting ready
The game of Pong is a two-player game, where the goal is to bounce the ball past the other player. The agent can move the paddle up or down (and, yes, the standard NoOp). One of the players in the OpenAI environment is a decent AI player who knows how to play the game well. Our goal is to train the second agent using policy gradients. Our agent gets proficient with each game it plays. While the code has been built to run only for 500 episodes, we should add a provision to save the agent state at specified checkpoints and, in the next run, start by loading the previous checkpoint. We achieve this by first declaring a saver, then using the TensorFlow saver.save method to save the present network state (checkpoint), and lastly ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access