In this section, we will analyze a JSON Dataset modeled as a graph. We will apply GraphFrame functions from the previous sections and introduce some new ones.
For hands-on exercises in this section, we use a Dataset containing Amazon product metadata; product information and reviews on around 548,552 products. This Dataset can be downloaded from https://snap.stanford.edu/data/amazon-meta.html.
For processing simplicity, the original Dataset was converted to a JSON format file with each line representing a complete record. Use the Java program (Preprocess.java) provided with this chapter for the conversion.
First, we create a DataFrame from the input file, and print out the schema and a few sample records. ...