March 2019
Beginner to intermediate
182 pages
4h 6m
English
In this section, we will reuse the same rdd for different actions. First, we will minimize the execution time by reusing the rdd. We will then look at caching and a performance test for our code.
The following example is the test from the preceding section but a bit modified, as here we take start by currentTimeMillis() and the result. So, we are just measuring the result of all actions that are executed:
//then every call to action means that we are going up to the RDD chain//if we are loading data from external file-system (I.E.: HDFS), every action means//that we need to load it from FS. val start = System.currentTimeMillis() println(rdd.collect().toList) println(rdd.count()) println(rdd.first()) ...
Read now
Unlock full access