July 2017
Intermediate to advanced
796 pages
18h 55m
English
wholeTextFiles() can be used to load multiple text files into a paired RDD containing pairs <filename, textOfFile> representing the filename and the entire content of the file. This is useful when loading multiple small text files and is different from textFile API because when whole TextFiles() is used, the entire content of the file is loaded as a single record:
sc.wholeTextFiles(path, minPartitions=None, use_unicode=True)
The following is an example of loading a textfile into an RDD using wholeTextFiles():
scala> val rdd_whole = sc.wholeTextFiles("wiki1.txt")rdd_whole: org.apache.spark.rdd.RDD[(String, String)] = wiki1.txt MapPartitionsRDD[37] at wholeTextFiles at <console>:25scala> rdd_whole.take(10)res56: Array[(String, ...Read now
Unlock full access