In the past few years, several architectures for object detection and classification have been developed using a two-step process. The first process was to use a region proposal to get regions of the input image that are likely to contain an object. The second step was to use a simple classifier on the proposed regions to classify the content.
Using a double-headed neural network allows us to have faster inference time, since only a single forward pass of a single model is needed to achieve better performance overall.
From the architectural side, supposing for simplicity that our feature extractor is AlexNet (when, instead, it is the more complex network Inception V3), adding a new head to the network changes the model ...