October 2015
Intermediate to advanced
230 pages
5h 10m
English
The goal of this chapter is to track multiple visually salient objects in a video sequence at once. Instead of labeling the objects of interest in the video ourselves, we will let the algorithm decide which regions of a video frame are worth tracking.
We have previously learned how to detect simple objects of interest (such as a human hand) in tightly controlled scenarios or how to infer geometrical features of a visual scene from camera motion. In this chapter, we ask what we can learn about a visual scene by looking at the image statistics of a large number of frames. By analyzing the Fourier spectrum of natural images we will build a saliency map, which allows us to label certain statistically interesting ...