July 2017
Intermediate to advanced
796 pages
18h 55m
English
Let's see how to read data in LIBSVM format using the read API and the load() method by specifying the format of the data (that is, libsvm) as follows:
# Creating DataFrame from libsvm dataset myDF = spark.read.format("libsvm").load("C:/Exp//mnist.bz2")
The preceding MNIST dataset can be downloaded from https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass/mnist.bz2. This will essentially return a DataFrame and the content can be seen by calling the show() method as follows:
myDF.show()
The output is as follows:

You can also specify ...
Read now
Unlock full access