Step 4 - Exploratory analysis of the input data

As described earlier, the dataset contains numerical input variables V1 to V28, which are the result of a PCA transformation of the original features. The response variable Class tells us whether a transaction was fraudulent (value = 1) or not (value = 0).

There are two additional features, Time and Amount. The Time column signifies the time in seconds between the current transaction and the first transaction. Whereas the Amount column signifies how much money was transferred in this transaction. So let's see a glimpse of the input data (only V1, V2, V26, and V27 are shown, though) in Figure 6:

Figure 6: A snapshot of the credit card fraud detection dataset

We have been able to load the transaction, ...

Get Scala Machine Learning Projects now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.