In this example, we will load a pipeline model and then evaluate the test data:
savedModel = PipelineModel.load(logRegDirfilename)predictions = savedModel.transform(testData)predictionAndLabels = predictions.select("label","prediction").rddmetrics = BinaryClassificationMetrics(predictionAndLabels)print("Area under ROC = %s" % metrics.areaUnderROC)
In the next step, we define random forest models:
randForest = RandomForestRegressor(featuresCol = 'indexedFeatures', labelCol = 'label',featureSubsetStrategy="auto",impurity='variance', maxBins=100)
Now, we will define a modeling pipeline that includes formulas, feature transformations, and an estimator:
pipeline = Pipeline(stages=[regFormula, ...