Mastering Machine Learning with R - Second Edition
by Cory Lesmeister, Doug Ortiz, Vikram Dhillon, Miroslav Kopecky
LASSO
LASSO applies the L1-norm instead of the L2-norm as in ridge regression, which is the sum of the absolute value of the feature weights and thus minimizes RSS + λ(sum |Bj|). This shrinkage penalty will indeed force a feature weight to zero. This is a clear advantage over ridge regression, as it may greatly improve the model interpretability.
The mathematics behind the reason that the L1-norm allows the weights/coefficients to become zero, is out of the scope of this book (refer to Tibsharini, 1996 for further details).
If LASSO is so great, then ridge regression must be clearly obsolete. Not so fast! In a situation of high collinearity or high pairwise correlations, LASSO may force a predictive feature to zero and thus you can lose the ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access