In an object classification problem, from a given image (or video clip), we're interested to know if it contains a region of interest (ROI) or object. More formally, "the image contains a car" versus "the image does not contain any car." To solve this problem, over the last few years, ImageNet and PASCAL VOC (see more at http://host.robots.ox.ac.uk/pascal/VOC/) have been in use and are based on deep CNN architectures .
Moreover, of course, the latest technological advances (that is, both software and hardware) have contributed to boosting the performance to the next level. Despite such success of state-of-the-art approaches for still images, detecting objects in videos is not easy. However, ...