In this section, we'll describe the structure of the capsule network, which the authors used to classify the MNIST dataset. The input of the network is the 28 x 28 MNIST greyscale images and the following are the steps:
- We'll start with a single convolutional layer with 256 9 x 9 filters, stride 1, and ReLU activation. The shape of the output volume is (256, 20, 20).
- We have another convolutional layer with 256 9 x 9 filters and stride 2. The shape of the output volume is (256, 6, 6).
- Use the output of the layer as a foundation for the first capsule layer, called PrimaryCaps. Take the (256, 6, 6) output volume and split it in to 32 separate (8, 6, 6) blocks. That is, each of the 32 blocks contains eight 6 ...