5

Cost-Sensitive Learning

So far, we have studied various sampling techniques and ways to oversample or undersample data. However, both of these techniques have their own unique set of issues. For example, oversampling can easily lead to overfitting of the model due to the exact or very similar examples being seen repeatedly. Similarly, with undersampling, we lose some information (that could have been useful for the model) because we discard the majority class examples to balance the training dataset. In this chapter, we’ll consider an alternative to the data-level techniques that we learned about previously.

Cost-sensitive learning is an effective strategy to tackle imbalanced data. We will go through this technique and learn why it can be ...

Get Machine Learning for Imbalanced Data now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.