July 2018
Intermediate to advanced
334 pages
8h 20m
English
In this step, we will extract the features of this dataset. We will do the following:
Check out the following code snippet:
import org.apache.spark.ml.feature.HashingTFval hashMapper = new HashingTF().setInputCol("words").setOutputCol("noStopWordsMailFeatures").setOutputCol("mailFeatureHashes").setNumFeatures(10000)hashFeatures: org.apache.spark.ml.feature.HashingTF = hashingTF_5ff221eac4b4
Next, we will transform the featured version of the noStopWordsDataFrame ...
Read now
Unlock full access