November 2017
Intermediate to advanced
274 pages
6h 16m
English
The authors used a multiresolution architecture that aimed to strike a compromise by having two separate streams of processing (called Fovea and Context Streams) over two spatial resolutions (see the following figure). A 178 × 178 frame video clip is the input to the network:

The context stream receives the downsampled frames at half the original spatial resolution (89 × 89 pixels). The fovea stream receives the center 89 × 89 region at the original resolution. In this way, the total input dimensionality is halved.
Read now
Unlock full access