July 2017
Intermediate to advanced
796 pages
18h 55m
English
DoubleRDD is an RDD consisting of a collection of double values. Due to this property, many statistical functions are available to use with the DoubleRDD.
The following are examples of DoubleRDD where we create an RDD from a sequence of double numbers:
scala> val rdd_one = sc.parallelize(Seq(1.0,2.0,3.0))rdd_one: org.apache.spark.rdd.RDD[Double] = ParallelCollectionRDD[52] at parallelize at <console>:25scala> rdd_one.meanres62: Double = 2.0scala> rdd_one.minres63: Double = 1.0scala> rdd_one.maxres64: Double = 3.0scala> rdd_one.stdevres65: Double = 0.816496580927726
The following is a diagram of the DoubleRDD and how you can run a sum() function on the DoubleRDD:
Read now
Unlock full access