Skip to Content
For Enterprise
For Government
For Higher Ed
For Individuals
For Marketing
For Enterprise
For Government
For Higher Ed
For Individuals
For Marketing
Explore Skills
Cloud Computing
Microsoft Azure
Amazon Web Services (AWS)
Google Cloud
Cloud Migration
Cloud Deployment
Cloud Platforms
Data Engineering
Data Warehouse
SQL
Apache Spark
Microsoft SQL Server
MySQL
Kafka
Data Lake
Streaming & Messaging
NoSQL Databases
Relational Databases
Data Science
Pandas
R
MATLAB
SAS
D3
Power BI
Tableau
Statistics
Exploratory Data Analysis
Data Visualization
AI & ML
Generative AI
Machine Learning
Artificial Intelligence (AI)
Deep Learning
Reinforcement Learning
Natural Language Processing
TensorFlow
Scikit-Learn
Hyperparameter Tuning
MLOps
Programming Languages
Java
JavaScript
Spring
Python
Go
C#
C++
C
Swift
Rust
Functional Programming
Software Architecture
Object-Oriented
Distributed Systems
Domain-Driven Design
Architectural Patterns
IT/Ops
Kubernetes
Docker
GitHub
Terraform
Continuous Delivery
Continuous Integration
Database Administration
Computer Networking
Operating Systems
IT Certifications
Security
Network Security
Application Security
Incident Response
Zero Trust Model
Disaster Recovery
Penetration Testing / Ethical Hacking
Governance
Malware
Security Architecture
Security Engineering
Security Certifications
Design
Web Design
Graphic Design
Interaction Design
Film & Video
User Experience (UX)
Design Process
Design Tools
Business
Agile
Project Management
Product Management
Marketing
Human Resources
Finance
Team Management
Business Strategy
Digital Transformation
Organizational Leadership
Soft Skills
Professional Communication
Emotional Intelligence
Presentation Skills
Innovation
Critical Thinking
Public Speaking
Collaboration
Personal Productivity
Confidence / Motivation
Features
All features
Verifiable skills
AI Academy
Courses
Certifications
Interactive learning
Live events
Superstreams
Answers
Insights reporting
Radar Blog
Buy Courses
Plans
Sign In
Try Now
O'Reilly Platform
book
数据分析之图算法: 基于Spark和Neo4j
by
Mark Needham
,
Amy E. Hodler
September 2020
Intermediate to advanced
213 pages
5h 25m
Chinese
Posts & Telecom Press
Content preview from
数据分析之图算法: 基于Spark和Neo4j
86
|
第
5
章
d
定义了下一次通过该链接单击的概率。可以将其视为
Web
冲浪者感到无聊并随机切换到另
一个页面的概率。
PageRank
评分反映了通过输入链接访问而非随机访问一个页面的可能性。
没有输出关系的节点(也称
悬挂节点
)或节点分组可以通过拒绝共享而独占
PageRank
得
分,这就是所谓的
等级沉没
。可以把这想象成一个
Web
冲浪者被困在某个页面或者页面子
集中没有出路。另一个难题是由分组中的各节点仅相互指向对方而造成的。当
Web
冲浪者
在节点之间来回跳转时,循环引用会导致其等级升高。这些情况如图
5-12
所示。
等级沉没独占评级得分
A
是一个没有输出关系的悬挂节点,隐形传态用于
解决端点问题
C
、
D
和
E
形成循环引用,没有走出群组的路径。可
借助阻尼系数引入随机节点访问
图
5-12
:等级沉没是因为一个节点或节点分组没有输出关系
有两种策略可以避免等级沉没。首先,当到达一个没有输出关系的节点时,
PageRank
算法
假设它到所有节点都有输出关系。穿越这些看不见的关联关系有时也被称为
隐形传态
。其
次,阻尼系数提供了另一种避免等级沉没的方法,相对于直接链接,引入了随机节点访问
的概率。当把
d
设为
0.85
时,完全访问随机节点的概率为
15%
。
虽然原始公式建议阻尼系数取
0.85
,但这起初适用于万维网上链接的幂律分布(大多数页
面的链接很少,少数页面有很多链接)。降低阻尼系数会降低在随机跳转之前沿着长关系
路径前进的可能性,反之将增加节点的前一个节点对其得分和评级的贡献。
如果通过
PageRank
算法得到了意外的结果,那么有必要对该图进行探索性分析,以判断 ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial
You might also like
大数据项目管理:从规划到实现
Ted Malaska, Jonathan Seidman
Presto实战
Matt Fuller, Manfred Moser, Martin Traverso
数据库系统内幕
Alex Petrov
精實企業|高績效組織如何達成創新規模化
Jez Humble, Joanne Molesky, Barry O'Reilly
Publisher Resources
ISBN: 9787115546678