Skip to Content
Python和NLTK实现自然语言处理
book

Python和NLTK实现自然语言处理

by Posts & Telecom Press, Nitin Hardeniya
February 2024
Intermediate to advanced
649 pages
9h 58m
Chinese
Packt Publishing
Content preview from Python和NLTK实现自然语言处理

第4章 词性标注

本章将介绍以下内容。

  • 默认标注。
  • 训练一元组词性标注器。
  • 回退标注的组合标注器。
  • 训练和组合N元标注器。
  • 创建似然单词标签模型。
  • 使用正则表达式标注。
  • 词缀标签。
  • 训练布里尔标注器。
  • 训练TnT标注器。
  • 使用WordNet进行标注。
  • 标注专有名词。
  • 基于分类器的标注。
  • 使用NLTK训练器训练标注器。

词性(Part-of-speech)标注是将句子(按照单词列表的形式组织的)变换为元组列表的过程,其中,元组的形式为(word,tag)。标注(tag)为词性标注,表示出单词是名词、形容词,还是动词。

该模块第5章会介绍词性标注是组块前的必要步骤。没有词性标注,组块器不可能知道如何从句子中提取短语。然而,使用词性标注,基于标注模式,就可以告诉组块器如何识别短语。

也可以使用词性标注进行语法分析和词义消歧。例如,单词duck可以指鸟,也可以是表示向下运动的动词。如果没有额外的信息(如词性标注),计算机不可能知道单词表示的不同意思。关于词义消歧的更多信息,请参阅维基百科网站。

这里介绍的大多数标注器是可训练的。它们使用已标注词性的句子列表作为训练数据,比如,从TaggedCorpusReader类的tagged_sents()中所得到句子(请参阅该模块3.4节,获得更多信息)。使用这些用于训练的句子,标注器生成了内部模型,该模块告诉标注器如何标注单词。其他标注器使用外部数据源或匹配的单词模式,为单词选择标签。

NLTK中的所有标注器都在nltk.tag包中,是从TaggerI基类继承而来的。TaggerI要求所有子类实现tag()方法,这个方法接受单词列表作为输入,返回已标注词性的单词列表,作为输出。TaggerI还提供了evaluate()方法,来评估标注器的准确率(下一节结尾会介绍)。我们将许多标注器组合成回退链(backoff ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

Java持续交付

Java持续交付

Daniel Bryant, Abraham Marín-Pérez
C++语言导学(原书第2版)

C++语言导学(原书第2版)

本贾尼 斯特劳斯特鲁普
软件开发实践:项目驱动式的Java开发指南

软件开发实践:项目驱动式的Java开发指南

Raoul-Gabriel Urma, Richard Warburton
Spark机器学习实战

Spark机器学习实战

Posts & Telecom Press, Siamak Amirghodsi, Meenakshi Rajendran, Broderick Hall, Shuen Mei

Publisher Resources

ISBN: 9781835083451