July 2018
Intermediate to advanced
334 pages
8h 20m
English
In this step, we will use the ++ method to join both dataframes in a Union operation:
val hamAndSpamNoCache: org.apache.spark.rdd.RDD[LabeledHamSpam] = (hamRDD3 ++ spamRDD3)hamAndSpam: org.apache.spark.rdd.RDD[LabeledHamSpam] = UnionRDD[20] at$plus$plus at <console>:34
In the next section, let's create a dataframe, with two columns:
Check for the following code snippet for better understanding:
val hamAndSpamDFrame" = hamAndSpam.select(hamAndSpam("punctLessSentences"), hamAndSpam("label"))dataFrame2: org.apache.spark.sql.DataFrame = [features: string, label: double]
We created the new dataframe. Let's display ...
Read now
Unlock full access