August 2019
Intermediate to advanced
342 pages
9h 35m
English
We conclude our considerations by showing you how the use of a rebalancing technique based on oversampling contributes to improving the accuracy of predictions.
We will use the implementation of the SMOTE oversampling algorithm offered by the imbalanced-learn library, increasing the fraud samples from 102 to 500 and reusing RandomForestClassifier on resampled data, as shown in the following example:
from collections import Counterfrom imblearn.over_sampling import SMOTE x = df[['Time', 'V1', 'V2', 'V3', 'V4', 'V5', 'V6', 'V7', 'V8', 'V9', 'V10', 'V11', 'V12', 'V13', 'V14', 'V15', 'V16', 'V17', 'V18', 'V19', 'V20', 'V21', 'V22', 'V23', 'V24', 'V25', 'V26', 'V27', 'V28', 'Amount']]y = df['Class']# Increase ...
Read now
Unlock full access