The frames in Atari are 210 x 160 pixels with RGB color, thus having an overall size of 210 x 160 x 3. If a history of 4 frames was used, the input would have a dimension of 210 x 160 x 12. Such dimensionality can be computationally demanding and it could be difficult to store a large number of frames in the experienced buffer. Therefore, a preprocessing step to reduce the dimensionality is necessary. In the original DQN implementation, the following preprocessing pipeline is used:
- RGB colors are converted into grayscale
- The images are downsampled to 110 x 84 and then cropped to 84 x 84
- The last three to four frames are concatenated to the current frame
- The frames are normalized
Furthermore, because the games are run at a ...