Region proposal network
In the first stage, the RPN takes an image (of any size) as input and will output a set of rectangular regions of interest (RoIs), where an object might be located. The RPN itself is created by taking the first p (13 in the case of VGG and 5 for ZF net) convolutional layers of the backbone model (see the preceding diagram). Once the input image is propagated to the last shared convolutional layer, the algorithm takes the feature map of that layer and slides another small net over each location of the feature map. The small net outputs whether an object is present at any of the k anchor boxes over each location (the concept of anchor box is the same as in YOLO). This concept is illustrated on the left-hand side image ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access