
144
|
第
7
章
7.2.1
探索性分析
首先用探索性分析了解数据概况,比如机场数量:
g.vertices.count()
1435
这些机场之间有多少连接?
g.edges.count()
616529
7.2.2
热门机场
哪些机场的出港航班最多?可以使用度中心性算法计算出港航班的数量:
airports_degree = g.outDegrees.withColumnRenamed("id", "oId")
full_airports_degree = (airports_degree
.join(g.vertices, airports_degree.oId == g.vertices.id)
.sort("outDegree", ascending=False)
.select("id", "name", "outDegree"))
full_airports_degree.show(n=10, truncate=False)
运行以上代码,结果如下所示:
id name outDegree
ATL Hartsfield Jackson Atlanta International Airport 33837
ORD Chicago O
’
Hare International Airport 28338
DFW Dallas Fort Worth International Airport 23765
CLT Charlotte Douglas International Airport ...