In an empty FraudDetectionPipeline.scala file, add in the following imports. These are imports that we need for Logging, Feature Vector creation, DataFrame and SparkSession respectively:
import org.apache.log4j.{Level, Logger}import org.apache.spark.ml.linalg.Vectorsimport org.apache.spark.sql.{DataFrame, SparkSession}
This is an all-important trait, holding a method for SparkSession creation and other code. The code from classes that extend from this trait can share one instance of a SparkSession:
trait FraudDetectionWrapper {
Next, we need the path to the testing dataset, meant for cross-validation, which is crucial to our classification:
val trainSetFileName = "training.csv"
The entry point to programming ...