Skip to Content
数据分析之图算法: 基于Spark和Neo4j
book

数据分析之图算法: 基于Spark和Neo4j

by Mark Needham, Amy E. Hodler
September 2020
Intermediate to advanced
213 pages
5h 25m
Chinese
Posts & Telecom Press
Content preview from 数据分析之图算法: 基于Spark和Neo4j
144
7
7.2.1
 探索性分析
首先用探索性分析了解数据概况,比如机场数量:
g.vertices.count()
1435
这些机场之间有多少连接?
g.edges.count()
616529
7.2.2
 热门机场
哪些机场的出港航班最多?可以使用度中心性算法计算出港航班的数量:
airports_degree = g.outDegrees.withColumnRenamed("id", "oId")
full_airports_degree = (airports_degree
.join(g.vertices, airports_degree.oId == g.vertices.id)
.sort("outDegree", ascending=False)
.select("id", "name", "outDegree"))
full_airports_degree.show(n=10, truncate=False)
运行以上代码,结果如下所示:
id name outDegree
ATL Hartsfield Jackson Atlanta International Airport 33837
ORD Chicago O
Hare International Airport 28338
DFW Dallas Fort Worth International Airport 23765
CLT Charlotte Douglas International Airport ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

大数据项目管理:从规划到实现

大数据项目管理:从规划到实现

Ted Malaska, Jonathan Seidman
Presto实战

Presto实战

Matt Fuller, Manfred Moser, Martin Traverso
精實企業|高績效組織如何達成創新規模化

精實企業|高績效組織如何達成創新規模化

Jez Humble, Joanne Molesky, Barry O'Reilly

Publisher Resources

ISBN: 9787115546678