In an object classification problem, from a given image (or video clip), we're interested to know if it contains a region of interest (ROI) or object. More formally, "the image contains a car" versus "the image does not contain any car." To solve this problem, over the last few years, ImageNet and PASCAL VOC (see more at have been in use and are based on deep CNN architectures .
Moreover, of course, the latest technological advances (that is, both software and hardware) have contributed to boosting the performance to the next level. Despite such success of state-of-the-art approaches for still images, detecting objects in videos is not easy. However, ...