In this section, we'll resolve the downsides of using a sliding window by using a convolutional sliding window and gain some intuition behind this technique.
Before we delve into this new method, we need to modify the convolution architecture that we've used so far.
Here is a typical CNN:
We have the input, an red, green, and blue (RGB) image with three channels, and here we'll use a small 32 x 32 image. This is followed by a convolution that leaves the first two dimensions unchanged and increases the number of channels to 64, the max pooling layer divides the first two dimensions by 2, and leaves the number ...