So, does applying these transformations have any impact on model performance? Let's evaluate the various metrics we used previously on the log-transformed data as an example.
We will do this first for the linear model by applying the log function to the label field of each LabeledPoint RDD. Here, we will only transform the target variable, and we will not apply any transformations to the features.
We will then train a model on this transformed data, and form the RDD of predicted versus true values.
Note that now that we have transformed the target variable, the predictions of the model will be on the log scale, as will the target values of the transformed dataset. Therefore, in order to use our ...