July 2018
Intermediate to advanced
334 pages
8h 20m
English
First, make sure you have the StopWordRemover import in the SpamClassifierPipeline class. Next, we will create an instance of StopWordRemover and pass into it a (hyper) parameter column, mailFeatureWords. We want an output column that is devoid of stop words:
val stopWordRemover = new StopWordsRemover().setInputCol("mailFeatureWords").setOutputCol("noStopWordsMailFeatures")
Just like with mailTokenizer, we call the transform method to get a new noStopWordsDataFrame:
val noStopWordsDataFrame = stopWordRemover.transform(tokenizedBagOfWordsDataFrame)
The resulting dataframe, a tokenized, non-null bag of lowercase words with no stop words looks like this:
noStopWordsDataFrame.show()+-----------------------+-----+ ...
Read now
Unlock full access