In the first stage, the RPN takes an image (of any size) as input and will output a set of rectangular regions of interest (RoIs), where an object might be located. The RPN itself is created by taking the first p (13 in the case of VGG and 5 for ZF net) convolutional layers of the backbone model (see the preceding diagram). Once the input image is propagated to the last shared convolutional layer, the algorithm takes the feature map of that layer and slides another small net over each location of the feature map. The small net outputs whether an object is present at any of the k anchor boxes over each location (the concept of anchor box is the same as in YOLO). This concept is illustrated on the left-hand side image ...
Region proposal network
Get Advanced Deep Learning with Python now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.