Since our fraudulent class is the important one, we are going to need the following to help us choose a classifier that has the best F1 score as follows
- Labeled data is the test DataFrame—testingDf
- PDF—a product of probabilities computed in the probabilityDensity function
Keeping in mind that we need labeled data (points) to arrive at the best F1 score, the following background information is helpful as follows:
- What is the role of cross-validation? To understand cross-validation, we revisit the validation process, where a subset of the samples from the training set is used to train the model. Cross-validation is an improvement over validation, because of the fact that there are more observations available for the model to be ...