A convolution layer consists of three major stages, each of which poses some structural constraints over a multilayered network:
- Feature extraction: Each unit makes connections from a locally receptive field in the previous layer, thus forcing the network to extract local features. If we have a 32 x 32 image and the receptive field size is 4 x 4, then one hidden layer will be connected to 16 units in the previous layers, and we will have 28 x 28 hidden units in total. Thus, the input layer makes 28 x 28 x 16 connections to the hidden layer, and this is the number of parameters (weight on each connection) between these two layers. Had it been a fully connected dense hidden layer, there would be 32 x 32 x 28 x 28 parameters. ...