November 2017
Intermediate to advanced
274 pages
6h 16m
English
Sports-1M dataset test set results (200,000 videos and 4,000,000 clips) are summarized in the following table. The approach of multiple networks consistently and significantly outperforms the feature-based baseline. The feature-based approach computes visual words densely over the duration of the video and produces predictions that are based on the complete video-level feature vector, while the authors' networks only see 20 randomly sampled clips individually:

Read now
Unlock full access