We started by defining a Seq data structure to house a series of vectors, each being a label and a feature vector. We then proceeded to convert the data structure to a DataFrame and ran it through Estimator.fit()to produce a model that fits the data. We examined the model's parameters and DataFrame schemas to understand the resulting model. We then proceeded to combine .select() and .predict() to decompose the DataFrame before looping to display the predictions and result.
While we don't have to use pipelines (a workflow concept in Spark borrowed from scikit-learn, http://scikit-learn.org/stable/index.html) to run a regression, we decided to expose you to the power of Spark ML pipelines and logistic regression algorithms in ...