July 2018
Intermediate to advanced
334 pages
8h 20m
English
Let's invoke the read method on our SparkSession instance and cache it. We will call this method later from the RecSystem object:
def buildSalesOrders(dataSet: String): DataFrame = { session.read .format("com.databricks.spark.csv") .option("header", true).schema(salesOrderSchema).option("nullValue", "") .option("treatEmptyValuesAsNulls", "true") .load(dataSet).cache()}
Next up, let's build a sales leads dataframe:
def buildSalesLeads(dataSet: String): DataFrame = { session.read .format("com.databricks.spark.csv") .option("header", true).schema(salesLeadSchema).option("nullValue", "") .option("treatEmptyValuesAsNulls", "true") .load(dataSet).cache()}
This completes the trait. Overall, it looks ...