July 2017
Intermediate to advanced
796 pages
18h 55m
English
Inverse Document Frequency (IDF) is an estimator, which is fit onto a dataset and then generates features by scaling the input features. Hence, IDF works on output of a HashingTF Transformer.
In order to invoke IDF, you need to import the package:
import org.apache.spark.ml.feature.IDF
First, you need to initialize an IDF specifying the input column and the output column. Here, we are choosing the words column rawFeatures created by the HashingTF and generate an output column feature:
scala> val idf = new IDF().setInputCol("rawFeatures").setOutputCol("features")idf: org.apache.spark.ml.feature.IDF = idf_d8f9ab7e398e
Next, invoking the fit() function on the input dataset yields an output Transformer:
scala> ...
Read now
Unlock full access