As we mentioned earlier, one of the interesting additions to spark 2.0.0 is the ML pipeline. A pipeline is nothing but a linear graph of transformers and estimators. If we look at the classes we have been using, they are either transformers or estimators. We had a decent pipeline for our classification example, as follows:
We started with Passengers, which was the Dataset that we read in.
algTreeobject was the algorithm object.
We would have created a pipeline:
valtreePipeline = new Pipeline().setStages(Array(indexer, assembler, algTree))
Then, we would ...