Convolutional layers allow us to preserve the original image dimensions, thereby improving our ability to learn features and reduce our computational load. They do this by utilizing something called a filter, which slides across the image, learning features by computing dot products. For example, a typical filter on the first layer of a CNN might have size 5 x 5 x 3 (namely, 5 pixels width and height, and 3 because images have a depth of three colors, RGB).
Mathematically, it's done as follows:
Here, w represents our filter, but it also represents our learnable weights. We take a transpose of our input filter, ...