Analyzing JSON input modeled as a graph 

In this section, we will analyze a JSON Dataset modeled as a graph. We will apply GraphFrame functions from the previous sections and introduce some new ones.

For hands-on exercises in this section, we use a Dataset containing Amazon product metadata; product information and reviews on around 548,552 products. This Dataset can be downloaded from https://snap.stanford.edu/data/amazon-meta.html.

For processing simplicity, the original Dataset was converted to a JSON format file with each line representing a complete record. Use the Java program (Preprocess.java) provided with this chapter for the conversion.

First, we create a DataFrame from the input file, and print out the schema and a few sample records. ...

Get Learning Spark SQL now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.