- We are not going to perform feature engineering in the first instance. The dataset has been downgraded in order to contain 30 features (28 anonymized + time + amount).
- We compare what happens when using resampling and when not using it. We test this approach using a simple logistic regression classifier.
- We evaluate the models by using some of the performance metrics mentioned previously.
- We repeat the best resampling/not-resampling method by tuning the parameters in the logistic-regression classifier.
- We perform a classifications model using other classification algorithms.
Setting our input and target variables + resampling:
- Normalize the Amount column
- The Amount column is not in line with the anonymized features:
from sklearn.preprocessing ...