July 2019
Beginner to intermediate
740 pages
16h 52m
English
When faced with a class imbalance in our data, we may want to try to balance the training data before we build a model around it. In order to do this, we can use one of the following imbalanced sampling techniques:
In the case of over-sampling, we pick a larger proportion from the class with fewer values in order to come closer to the amount of the majority class; this may involve a technique such as bootstrapping, or generating new data similar to the values in the existing data (using machine learning algorithms such as nearest neighbors). Under-sampling, on the other hand, will take less data overall by reducing the amount taken from the majority ...
Read now
Unlock full access