As you have noticed, neural networks have a huge number of trainable parameters (weights and biases). Even with small MLPs, such as the ones we are working with, we see that we have hundreds or thousands of parameters. In the network we are working with, we have more than 22,000 trainable parameters. This is one of the reasons neural networks overfit very easily, which is why we often have to use some kind of regularization technique.
One of the most popular techniques for neural network regularization is dropout, an intuitively simple approach proposed by Hinton et al. (2012). Geoffrey E. Hinton is one of the key researchers in the development of deep learning theory and techniques. The idea is simple—at each step of the training ...