January 2018
Intermediate to advanced
310 pages
7h 48m
English
Frame-wise, the prediction of a video may not yield good results due to the downsampling of images, which loses fine details. Using a high-resolution CNN will increase the inference time. Hence, Karpathy et al. (https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/42455.pdf) propose fusing two streams that are run in parallel for video classification. There are two problems with doing frame-wise predictions, namely:
The architecture can be simplified with fewer parameters with two smaller encoders running in parallel. The ...
Read now
Unlock full access