How does it work?
A pipeline is a sequence of stages and each stage is either a Transformer or an Estimator. The stages are run in a sequence in a way that the input frame is transformed as it passes through each stage of the process:
- Transformer stages: The
transform()method on the DataFrame
- Estimator stages: The
fit()method on the DataFrame
A pipeline is created by declaring its stages, configuring appropriate parameters, and then chaining them in a pipeline object. For example, if we were to create a simple classification pipeline we would tokenize the data into columns, use the hashing term feature extractor to extract features, and then build a logistic regression model.
Please ensure that you add Apache Spark ML Jar either in the ...