March 2019
Beginner to intermediate
182 pages
4h 6m
English
raw_data = sc.textFile("./kddcup.data.gz")
raw_data
This output is as demonstrated in the following code snippet:
./kddcup.data,gz MapPartitionsRDD[3] at textFile at NativeMethodAccessorImpl.java:0
If we enter the raw_data variable, it gives us details regarding kddcup.data.gz, where raw data underlying the data file is located, and tells us about MapPartitionsRDD.
Now that we know how to load the data into Spark, let's learn about parallelization with Spark RDDs.
Read now
Unlock full access