O'Reilly logo

Python Reinforcement Learning Projects by Rajalingappaa Shanmugamani, Yang Wenzhuo, Sean Saito

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Data preparation

Careful readers may notice that a suffix, v0, follows each game name, and come up with the following questions: What is the meaning of v0? Is it allowable to replace it with v1 or v2? Actually, this suffix has a relationship with the data preprocessing step for the screen images (observations) extracted from the Atari environment.

There are three modes for each game, for example, Breakout, BreakoutDeterministic, and BreakoutNoFrameskip, and each mode has two versions, for example, Breakout-v0 and Breakout-v4. The main difference between the three modes is the value of the frameskip parameter in the Atari environment. This parameter indicates the number of frames (steps) the one action is repeated on. This is called the

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required