Ridge regression imposes an additional shrinkage penalty to the ordinary least squares loss function to limit its squared L2 norm:
In this case, X is a matrix containing all samples as columns and the term w represents the weight vector. The additional term (through the coefficient alpha—if large it implies a stronger regularization and smaller values) forces the loss function to disallow an infinite growth of w, which can be caused by multicollinearity or ill-conditioning. In the following figure, there's a representation of what happens when a Ridge penalty is applied:
The gray surface represents the loss function ...