July 2019
Beginner to intermediate
740 pages
16h 52m
English
As we hinted at earlier, under-sampling will reduce the amount of data available to train our model on. This means we should only attempt this if we have enough data that we can accept eliminating some of it. Let's see what happens with the red wine quality data, since we don't have much data to begin with. We will use the RandomUnderSampler from imblearn to randomly under-sample the low-quality red wines in the training set:
>>> from imblearn.under_sampling import RandomUnderSampler>>> X_train_undersampled, y_train_undersampled = RandomUnderSampler(... random_state=0... ).fit_resample(r_X_train, r_y_train)
We went from almost 14% of the training data being high-quality red wine to 50% of it; however, notice that this came ...
Read now
Unlock full access