December 2018
Beginner to intermediate
684 pages
21h 9m
English
We have encountered parameter norm penalties as L1 regularization and the corresponding lasso regression as L2 regularization and Ridge regression in Chapter 7, Linear Models.
In the context of DL, parameter norm penalties similarly modify the objective function by adding a term that represents the L1 or L2 norm of the parameters, weighted by a hyperparameter that requires tuning. For neural networks, the bias parameters are usually not constrained, only the weights. Sometimes different penalties or hyperparameter values are used for different layers, but the added tuning complexity quickly becomes prohibitive.
L2 regularization preserves directions along which the parameters contribute significantly to reduce the ...