Now that we have prepared our training and test sets, we are ready to investigate the impact of the different parameter settings on model performance. We will first carry out this evaluation for the linear model. We will create a convenience function to evaluate the relevant performance metric by training the model on the training set, and evaluating it on the test set for different parameter settings.
We will use the RMSLE evaluation metric, as it is the one used in the Kaggle competition with this dataset, and this allows us to compare our model results against the competition leaderboard to see how we perform.
The evaluation function is defined here:
def evaluate(train: RDD[LabeledPoint],test: ...