Chapter 12. Object Detection
The previous chapter introduced two popular algorithms for detecting faces in photographs: Viola-Jones, which relies on machine learning, and MTCNNs, which rely on deep learning. Face detection is a special case of object detection, in which computers detect and identify objects in images. Identifying an object is an image classification problem, something at which CNNs excel. But finding objects to identify poses a different challenge.
Object detection is challenging because objects aren’t assumed to be perfectly cropped and aligned as they are for image classification tasks. Nor are they limited to one per image. Figure 12-1 shows what a self-driving car might see as it scans video frames from a forward-pointing camera. A CNN trained to do conventional image classification using carefully prepared training images is powerless to help. It might be able to classify the image as one of a city street, but it can’t determine that the image contains cars, people, and traffic lights, much less pinpoint their locations.
Object detection has grown in speed and accuracy in recent years, and state-of-the-art methods rely on deep learning. In particular, they employ CNNs, which were introduced in Chapter 10. Let’s discuss how CNNs do object detection and identification and try our hand at it using cutting-edge object detection models.
Get Applied Machine Learning and AI for Engineers now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.