May 2019
Intermediate to advanced
664 pages
15h 41m
English
LASSO applies the L1-norm instead of the L2-norm as in ridge regression, which is the sum of the absolute value of the feature weights and so minimizes RSS + λ(sum |Bj|). This shrinkage penalty will indeed force a feature weight to zero. This is a clear advantage over ridge regression, as it may improve the model interpretability.
The mathematics behind the reason that the L1-norm allows the weights/coefficients to become zero is beyond the scope of this book (refer to Tibsharini, 1996 for further details).
If LASSO is so great, then ridge regression must be obsolete in machine learning. Not so fast! In a situation of high collinearity or high pairwise correlations, LASSO may force a predictive feature to zero, hence you can lose the ...