Skip to Content
图数据实战:用图思维和图技术解决复杂问题
book

图数据实战:用图思维和图技术解决复杂问题

by Denise Koessler Gosnell, Matthias Broecheler
March 2024
Beginner to intermediate
351 pages
7h 37m
Chinese
China Machine Press
Content preview from 图数据实战:用图思维和图技术解决复杂问题
292
|
11
11.4
节中,我们将逐步介绍合并过程。我们希望你对我们的方法论部分有正确的期
望:两个数据源所需的匹配和合并类型不需要图结构来进行实体解析。我们希望本节中
的细节能帮助你了解原因。
11.5
节中,我们将深入研究在合并过程中发现的错误,并介绍数据中假阳性和真阴性
之间的区别。我们还将简要介绍一些误用图结构来解析数据中的实体的常见问题。我们
将展示几个例子,其中图结构增强了实体解析流程。
我们本章的最终目标有两个。
本章的第一个目标是展示合并数据的实际情况。警告:这个过程并不迷人。合并数据集
是一项烦琐的工作,经常被忽视,尽管它是创建图模型的常见第一步。
本章的第二个目标是让你了解整个问题域。因为合并数据是创建图数据库最常见的第一
步之一,所以我们希望这些信息能够帮助你理解解决这个复杂问题所需的所有工具。提
示:你最有可能使用的大多数(如果不是全部)实体解析技术都不需要图结构来确定谁
是谁。
11.2
定义一个不同的复杂问题:实体解析
两个数据源之间的匹配和合并过程的主要工作是一个称为实体解析的庞大问题域。非正
式地说,实体解析的复杂问题旨在解决不同数据源中谁是谁或什么是什么的问题。
Jon Smith
John Smith
是同一个人吗
?
或者在我们的电影数据中,来自
MovieLens
电影
Das Versprechen
和来自
Kaggle
的电影
The Promise
是同一个吗?
然而,在大多数传统情况下,链接身份的唯一用户标识符可能无法使用,原因有很多:
外部源数据的使用、用户隐私限制导致的数据不可用,或者不一致的数据。 ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

What Successful Project Managers Do

What Successful Project Managers Do

W. Scott Cameron, Jeffrey S. Russell, Edward J. Hoffman, Alexander Laufer
How to Overcome a Power Deficit

How to Overcome a Power Deficit

Cyril Bouquet, Jean-Louis Barsoux
The Human Factor in AI-Based Decision-Making

The Human Factor in AI-Based Decision-Making

Philip Meissner, Christoph Keding

Publisher Resources

ISBN: 9787111736288