In this exercise, we will be training a random forest classifier. First, we will index the categorical features and labels as required by spark.ml. Next, we will assemble the feature columns into a vector column because every spark.ml machine learning algorithm expects it. Finally, we can train our random forest on a training Dataset. Optionally, we can also unindex the labels to make them more readable.
There are several ready-to-use Transformers available to index categorical features. We can assemble all the features into one vector (using VectorAssembler) and then use a VectorIndexer to index it. The drawback of VectorIndexer is that it will index every feature that has ...