To solve these problems, the authors of the paper propose a new type of network building block, called a capsule, instead of the neuron. The output of a neuron is a scalar value. In contrast, the output of a capsule is a vector (a list of values), which consists of the following:
- The elements of the vector represent the pose and other properties of the object.
- The length of the vector is in the (0, 1) range and represents the probability of detecting the feature at that location. As a reminder, the length of a vector is , where vi are the vector elements.
Let's consider a capsule, which detects faces. If we start moving a face across ...