March 2019
Intermediate to advanced
538 pages
13h 38m
English
Common object detection in classical computer vision uses a sliding window to detect objects, scanning a whole image with different window sizes and scales. The main problem here is the huge time consumption in scanning the image several times to find objects.
YOLO uses a different approach by dividing the diagram into an S x S grid. For each grid, YOLO checks for B bounding boxes, and then the deep learning model extracts the bounding boxes for each patch, the confidence to contain a possible object, and the confidence of each category in the training dataset per each box. The following screenshot shows the S x S grid:
YOLO is trained with a grid of 19 and 5 bounding boxes per grid using 80 categories. ...