March 2020
Intermediate to advanced
366 pages
9h 8m
English
The goal of this chapter is to track multiple visually salient objects in a video sequence at once. Instead of labeling the objects of interest in the video ourselves, we will let the algorithm decide which regions of a video frame are worth tracking.
We have previously learned how to detect simple objects of interest (such as a human hand) in tightly controlled scenarios and how to infer geometrical features of a visual scene from camera motion. In this chapter, we ask what we can learn about a visual scene by looking at the image statistics of a large number of frames.
In this chapter, we will cover the following topics: