5

Regularization with Data

Even though there are plenty of regularization methods for models (with each model having a unique set of hyperparameters), sometimes, the most effective regularization comes from the data itself. Indeed, sometimes, even the most powerful model can’t have good performance if the data is not transformed properly beforehand.

In this chapter, we’ll look at some methods that help regularize models from data:

  • Hashing high cardinality features
  • Aggregating features
  • Undersampling an imbalanced dataset
  • Oversampling an imbalanced dataset
  • Resampling imbalanced data with SMOTE

Technical requirements

In this chapter, you will apply several tricks to data, as well as resample datasets or download new data via the command line. ...

Get The Regularization Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.