The most widespread regularization methods are L2-regularization, dropout, and batch normalization. Let's take a look:
- L2-regularization (weight decay) is performed by penalizing the weights with the highest values. Penalizing is performed by minimizing their -norm using the parameter – a regularization coefficient that expresses the preference for minimizing the norm when we need to minimize losses on the training set. That is, for each weigh, , we add the term, , to the loss function, (the factor is ...