Skip to Content
Machine Learning Pocket Reference
book

Machine Learning Pocket Reference

by Matt Harrison
August 2019
Intermediate to advanced
318 pages
4h 40m
English
O'Reilly Media, Inc.
Book available
Content preview from Machine Learning Pocket Reference

Chapter 9. Imbalanced Classes

If you are classifying data, and the classes are not relatively balanced in size, the bias toward more popular classes can carry over into your model. For example, if you have 1 positive case and 99 negative cases, you can get 99% accuracy simply by classifying everything as negative. There are various options for dealing with imbalanced classes.

Use a Different Metric

One hint is to use a measure other than accuracy (AUC is a good choice) for calibrating models. Precision and recall are also better options when the target sizes are different. However, there are other options to consider as well.

Tree-based Algorithms and Ensembles

Tree-based models may perform better depending on the distribution of the smaller class. If they tend to be clustered, they can be classified easier.

Ensemble methods can further aid in pulling out the minority classes. Bagging and boosting are options found in tree models like random forests and Extreme Gradient Boosting (XGBoost).

Penalize Models

Many scikit-learn classification models support the class_weight parameter. Setting this to 'balanced' will attempt to regularize minority classes and incentivize the model to classify them correctly. Alternatively, you can grid search and specify the weight options by passing in a dictionary mapping class to weight (give higher weight to smaller classes).

The XGBoost library has the max_delta_step parameter, which can be set from 1 to 10 to make the update step more conservative. ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Practical Simulations for Machine Learning

Practical Simulations for Machine Learning

Paris Buttfield-Addison, Mars Buttfield-Addison, Tim Nugent, Jon Manning

Publisher Resources

ISBN: 9781492047537Errata PageSupplemental Content