In a few short learning scenarios, it would be very easy for a high-capacity model (an NN with many layers and parameters) to overfit. Prototypical networks (as discussed in the Prototypical Networks for Few-shot Learning paper, https://arxiv.org/abs/1703.05175) address this issue by computing a special prototype vector of each label, which is based on all samples of that label. The same prototypical network computes an embedding of the query samples as well. Then, we measure the distance between the query embedding and the prototypes and assign the query class accordingly (more details on this later in the section).
Prototypical networks work for both zero-shot and few-shot learning, as illustrated in the following ...