Using data to understand statistical regularization

Variable selection is an imperative process within the field of statistics as it tries to make models simpler to understand, easier to train, and free of misassociations--by eliminating variables unrelated to the output.

This (variable selection) is one possible approach to dealing with the problem of overfitting. In general, we don't expect a model to completely fit our data; in fact, the problem of overfitting often means that it may be disadvantageous to our predictive model's accuracy on unseen data if we fit our training or test data too well.

Rather than using variable selection, the process of regularization is an alternative approach to reducing the number of variables in the data ...

Get Statistics for Data Science now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.