Convolutional layers have been used in various configurations for performing image classification and recognition tasks successfully for some time now. The problem we encounter with using straight 2D CNNs is we are essentially flattening state representations, but generally not in a good way. This means that we are taking a visual observation of a 3D space and flattening it to a 2D image that we then try and extract important features from. This results in an agent thinking it is in the same state if it recognizes the same visual features from potentially different locations in the same 3D environment. This creates confusion in the agent and you can visualize this by watching an agent just wander in ...
ResNet for visual observation encoding
Get Hands-On Reinforcement Learning for Games now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.