The result of our app can be seen in the following image:
Throughout the video sequence, the algorithm is able to pick up the location of the players, successfully tracking them frame-by-frame by using mean-shift tracking, and combining the resulting bounding boxes with the bounding boxes returned by the salience detector.
It is only through the clever combination of the saliency map and tracking that we can exclude false-positives such as line markings and artifacts of the saliency map. The magic happens in
cv2.groupRectangles, which requires a similar bounding box to appear at least twice in the
box_all list, otherwise it ...