February 2017
Intermediate to advanced
274 pages
5h 58m
English
Expanding upon our tripGraph GraphFrame, the following query will allow us to find the most popular non-stop flights in the US (for this dataset):
# Determine the most popular non-stop flights
import pyspark.sql.functions as func
topTrips = tripGraph \
.edges \
.groupBy("src", "dst") \
.agg(func.count("delay").alias("trips"))
# Show the top 20 most popular flights (single city hops)
display(topTrips.orderBy(topTrips.trips.desc()).limit(20))Note, while we are using the delay column, we're just actually doing a count of the number of trips. Here's the output:

As can be observed from this query, the two most ...
Read now
Unlock full access