Chapter 5. Convolutional Neural Networks

In this chapter, we’ll cover convolutional neural networks (CNNs). CNNs are the standard neural network architecture used for prediction when the input observations are images, which is the case in a wide range of neural network applications. So far in the book, we’ve focused exclusively on fully connected neural networks, which we implemented as a series of Dense layers. Thus, we’ll start this chapter by reviewing some key elements of these networks and use this to motivate why we might want to use a different architecture for images. We’ll then cover CNNs in a manner similar to that in which we introduced other concepts in this book: we’ll first discuss how they work at a high level, then move to discussing them at a lower level, and finally show in detail how they work by coding up the convolution operation from scratch.1 By the end of this chapter, you’ll have a thorough enough understanding of how CNNs work to be able to use them both to solve problems and to learn about advanced CNN variants, such as ResNets, DenseNets, and Octave Convolutions on your own.

Neural Networks and Representation Learning

Neural networks initially receive data on observations, with each observation represented by some number n features. So far we’ve seen two examples of this in two very different domains: the first was the house prices dataset, where each observation was made up of 13 features, each of which represented a numeric characteristic about that ...

Get Deep Learning from Scratch now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.