January 2019
Intermediate to advanced
386 pages
11h 13m
English
The first technique we are going to discuss is weight decay (also known as L2 regularization). It works by adding additional terms to the value of the loss function. Without going into too much detail, we'll say that this term is a function of all the weights of the network. This means that, if the network weights have large values, the loss function increases. In effect, weight decay penalizes large network weights (hence the name). This prevents the network from relying too heavily on a few features associated with these weights. There is less chance of overfitting, when the network is forced to work with multiple features. In practical terms, we can add weight decay by changing the weight update rule, we introduced in Chapter ...