O'Reilly logo

Effective Amazon Machine Learning by Alexis Perrier

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

L1 regularization and Lasso

L1 regularization usually entails some loss of predictive power of the model. 

One of the properties of L1 regularization is to force the smallest weights to 0 and thereby reduce the number of features taken into account in the model. This is a desired behavior when the number of features (n) is large compared to the number of samples (N). L1 is better suited for datasets with many features. 

The Stochastic Gradient Descent algorithm with L1 regularization is known as the Least Absolute Shrinkage and Selection Operator (Lasso) algorithm.

In both cases the hyper-parameters of the model are as follows:

  • The learning rate  of the SGD algorithm
  • A parameter  to tune the amount of regularization added to the model

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required