O'Reilly logo

Fast Data Processing with Spark 2 - Third Edition by Krishna Sankar

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Saving your data

While distributed computational jobs are a lot of fun, they are much more useful when the results are stored in a useful place. While the methods for loading an RDD are largely found in the SparkContext class, the methods for saving an RDD are defined on the RDD classes. In Scala, implicit conversions exist so that an RDD, which can be saved as a sequence file, could be converted to the appropriate type; in Java, explicit conversions must be used.

Here are the different ways to save an RDD.

Here's the code for Scala:

rddOfStrings.saveAsTextFile("out.txt") 
keyValueRdd.saveAsObjectFile("sequenceOut") 

Here's the code for Java:

rddOfStrings.saveAsTextFile("out.txt") 
keyValueRdd.saveAsObjectFile("sequenceOut") 

Here's the code for Python: ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required