Ridge, Lasso, and ElasticNet

Ridge regression imposes an additional shrinkage penalty to the ordinary least squares loss function to limit its squared L2 norm:

In this case, X is a matrix containing all samples as columns and the term w represents the weight vector. The additional term (through the coefficient alpha—if large it implies a stronger regularization and smaller values) forces the loss function to disallow an infinite growth of w, which can be caused by multicollinearity or ill-conditioning. In the following figure, there's a representation of what happens when a Ridge penalty is applied:

The gray surface represents the loss function ...

Get Machine Learning Algorithms now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.