May 2018
Beginner to intermediate
384 pages
10h 19m
English
In the following code snippet, we have implemented sentiment analysis based on the NLP theory we discussed in this chapter. It uses SPARK libraries on Tweeter JSON records to train models for identifying sentiments like happy or unhappy. It looks for keywords like happy in the twitter messages and then flags it with value 1 indicating that this message represents a happy sentiment. Other messages are flagged with value 0 which represents unhappy sentiment. Finally TF-IDF algorithm is applied to train models:
import org.apache.spark.ml.feature.{HashingTF, RegexTokenizer, StopWordsRemover, IDF}import org.apache.spark.sql.functions._import org.apache.spark.ml.classification.LogisticRegressionimport org.apache.spark.ml.Pipeline ...Read now
Unlock full access