March 2020
Intermediate to advanced
366 pages
9h 8m
English
In the previously stated ideas, we have used single-object classification or detection networks to achieve multiple-object detection. In all scenarios, for each predefined region, we feed the network with the complete image or part of it multiple times. In other words, we have multiple passes that result in heavy architecture.
Wouldn't it be nice to have a network that, once fed with an image, detects all the objects in the scene in a single pass? An idea that you could try is to make more outputs for our single-object detector so that it predicts multiple boxes instead of one. This is a good idea, but there is a problem. Suppose we have multiple dogs in the scene that could appear in different locations and in different ...