July 2017
Intermediate to advanced
796 pages
18h 55m
English
You can read a raw text data file using the textFile() method. Suppose you have the logs of some purchase:
number\tproduct_name\ttransaction_id\twebsite\tprice\tdate0\tjeans\t30160906182001\tebay.com\t100\t12-02-20161\tcamera\t70151231120504\tamazon.com\t450\t09-08-20172\tlaptop\t90151231120504\tebay.ie\t1500\t07--5-20163\tbook\t80151231120506\tpackt.com\t45\t03-12-20164\tdrone\t8876531120508\talibaba.com\t120\t01-05-2017
Now reading and creating RDD is pretty straightforward using the textFile() method as follows:
myRDD = spark.sparkContext.textFile("sample_raw_file.txt")$cd myRDD$ cat part-00000 number\tproduct_name\ttransaction_id\twebsite\tprice\tdate 0\tjeans\t30160906182001\tebay.com\t100\t12-02-20161\tcamera\t70151231120504\tamazon.com\t450\t09-08-2017 ...Read now
Unlock full access