Let's invoke the read method on our SparkSession instance and cache it. We will call this method later from the RecSystem object:
def buildSalesOrders(dataSet: String): DataFrame = { session.read .format("com.databricks.spark.csv") .option("header", true).schema(salesOrderSchema).option("nullValue", "") .option("treatEmptyValuesAsNulls", "true") .load(dataSet).cache()}
Next up, let's build a sales leads dataframe:
def buildSalesLeads(dataSet: String): DataFrame = { session.read .format("com.databricks.spark.csv") .option("header", true).schema(salesLeadSchema).option("nullValue", "") .option("treatEmptyValuesAsNulls", "true") .load(dataSet).cache()}
This completes the trait. Overall, it looks ...