Skip to Content
Python文本分析
book

Python文本分析

by Jens Albrecht, Sidharth Ramachandran, Christian Winkler
August 2022
Intermediate to advanced
441 pages
11h 26m
Chinese
China Electric Power Press Ltd.
Content preview from Python文本分析
356
12
目标是将文本中的指称链接到本体中唯一的真实实体。如此一来,就可以消除模糊
性:
Q64
指的是德国柏林,而不是新罕布什尔州的柏林(这个柏林在维基数据中是
Q821244
)。在连接不同资源的信息并构建知识库时,这一步非常关键。
命名
实体识别
共指消解 实体链接
关系提取
信息提取
清理文本
知识图谱
12-2:信息提取的过程
最后一步是关系提取(
relation extraction
),即识别实体之间的关系。在实际的应用中,
通常你只会考虑几个关系,因为很难从任意文本中正确抽取这类信息。
最后,你可以将这个图保存到图数据库中,作为知识库应用程序的后台。这样的图
数据库存储的数据可以是
RDF
三元组(三元存储),也可以是属性图的形式,其中
节点和边可以拥有任意属性。常用的图数据库有
GraphDB
(三元存储)、
Neo4j
Grakn
(属性图)。
对于每一个步骤,你都可以从基于规则的方式和机器学习中进行选择。我们将使用
spaCy
提供的模型,以及一些额外的规则。但是我们并不会训练自己的模型。使用
规则提取领域特定的知识有一个优势:无需训练数据即可快速着手。我们可以看到,
使用基于规则的方法产生的结果,也能进行一些非常有意义的分析。但是如果你打
算构建一个大规模的预料知识库,那么必须训练自己的模型,识别命名实体,检测
关系,并链接实体。
12.3
数据集简介
假设你从事的是金融业务,你希望跟进并购与收购方面的新闻。我们希望能够自动
识别公司名称以及他们的交易类型,并将结果放入知识库中。在本章中,我们将
介绍提取有关公司的某些信息的方法 ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

精益AI

精益AI

Lomit Patel
构建知识图谱

构建知识图谱

Jesus Barrasa, Jim Webber
写给系统管理员的Python脚本编程指南

写给系统管理员的Python脚本编程指南

Posts & Telecom Press, Ganesh Sanjiv Naik

Publisher Resources

ISBN: 9787519864446