April 2018
Intermediate to advanced
334 pages
10h 18m
English
Here, a deep Q-network is trained for which two models are used to create a part of state representation of the agent. The two models are as follows:
For the Image-Zooms model, each region is resized to 224x224 and fed into VGG-16 through the Pool5 layer to obtain a feature map. For the Pool45-Crops model, the image at full-resolution is fed into VGG-16 through the Pool5 layer. The feature maps extracted from the whole image for all the regions of interest (ROI) is pooled.
The two models for feature extraction outputs a feature map of 7x7, which is fed into the common block (as shown in the following architecture). These feature maps and the memory vector (discussed previously) are ...