Now it is time to run the first transformer (which is actually an estimator). It is StringIndexer and needs to keep track of an internal mapping table between strings and indexes. Therefore, it is not a transformer but an estimator:
import org.apache.spark.ml.feature.{OneHotEncoder, StringIndexer}var indexer = new StringIndexer() .setHandleInvalid("skip") .setInputCol("L0_S22_F545") .setOutputCol("L0_S22_F545Index")var indexed = indexer.fit(df_notnull).transform(df_notnull)indexed.printSchema
As we can see clearly in the following image, an additional column called L0_S22_F545Index has been created:
Finally, let's examine ...