Chapter 10. Image Classification with Convolutional Neural Networks
Computer vision is a branch of deep learning in which computers discern information from images. Real-world uses include identifying objects in photos, removing inappropriate images from social media sites, counting the cars in line at a tollbooth, and recognizing faces in photos. Computer-vision models can even be combined with natural language processing (NLP) models to caption photos. I snapped a photo while on vacation and asked Azure’s Computer Vision service to caption it. The result is shown in Figure 10-1. It’s somewhat remarkable given that no human intervention was required.
The field of computer vision has advanced rapidly in recent years, mostly due to convolutional neural networks, also known as CNNs or ConvNets. In 2012, an eight-layer CNN called AlexNet outperformed traditional machine learning models entered in the annual ImageNet Large Scale Visual Recognition Challenge (ILSVRC) by achieving an error rate of 15.3% when identifying objects in photos. In 2015, ResNet-152 featuring a whopping 152 layers won the challenge with an error rate of just 3.5%, which exceeds a human’s ability to classify images featured in the competition.
CNNs are magical because they treat images as images rather than just arrays of pixel ...
Get Applied Machine Learning and AI for Engineers now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.