A circle of training includes a forward pass and backpropagation:
- For each input image, we first pass it through the convolutional layer. The convolved results are fed into the activation function (that is, CONV + ReLU).
- The obtained activation map is then aggregated by the max pooling function, that is, POOL. The pooling will result in a smaller size of the patch and help to reduce the number of features.
- CONV (+ReLU) and POOL layers will be repeated a few times before they are connected to the fully connected layers. This increases the depth of the network which increases its capability of modeling complex data. Also, different levels of filters learn the data's hierarchical representation at different levels. Please refer to ...