Step 4 - Exploratory analysis of the input data

As described earlier, the dataset contains numerical input variables V1 to V28, which are the result of a PCA transformation of the original features. The response variable Class tells us whether a transaction was fraudulent (value = 1) or not (value = 0).

There are two additional features, Time and Amount. The Time column signifies the time in seconds between the current transaction and the first transaction. Whereas the Amount column signifies how much money was transferred in this transaction. So let's see a glimpse of the input data (only V1, V2, V26, and V27 are shown, though) in Figure 6:

Figure 6: A snapshot of the credit card fraud detection dataset

We have been able to load the transaction, ...

Get Scala Machine Learning Projects now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.