Localization as a regression problem

Ignoring for a moment the classification problem and focusing only on the localization part, we can think about the localization as the problem of regressing the four coordinates of the bounding box that contains the subject of the input image.

In practice, there is not much difference between training a CNN to solve a classification task or a regression task: the architecture of the feature extractor remains the same, while the classification head changes and becomes a regression head. In the very end, this only means to change the number of output neurons from the number of classes to 4, one neuron per coordinate of the bounding box.

The idea is that the regression head should learn to output the correct ...

Get Hands-On Neural Networks with TensorFlow 2.0 now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.