Analyzing JSON input modeled as a graph 

In this section, we will analyze a JSON Dataset modeled as a graph. We will apply GraphFrame functions from the previous sections and introduce some new ones.

For hands-on exercises in this section, we use a Dataset containing Amazon product metadata; product information and reviews on around 548,552 products. This Dataset can be downloaded from https://snap.stanford.edu/data/amazon-meta.html.

For processing simplicity, the original Dataset was converted to a JSON format file with each line representing a complete record. Use the Java program (Preprocess.java) provided with this chapter for the conversion.

First, we create a DataFrame from the input file, and print out the schema and a few sample records. ...

Get Learning Spark SQL now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.