In this part of the chapter, the exercise is to use the Spark sample code to create a logistic regression model, save the model, and evaluate the performance of the model on a test dataset. For modeling, the features and class labels are specified using the RFormula function. In this example, we will train the model using the pipeline formula and a logistic regression estimator. This can be seen from the following code snippet:
logReg = LogisticRegression(maxIter=10, regParam=0.3, elasticNetParam=0.8)
The following code block sets up the training formula and assigns it to the classFormula variable, which can be seen from the following code:
classFormula = RFormula(formula="tipped ~ pickup_hour + weekday + ...