Skip to Content
Python和NLTK实现自然语言处理
book

Python和NLTK实现自然语言处理

by Posts & Telecom Press, Nitin Hardeniya
February 2024
Intermediate to advanced
649 pages
9h 58m
Chinese
Packt Publishing
Content preview from Python和NLTK实现自然语言处理

第6章 转换组块与树

本章将介绍以下内容。

  • 过滤句子中无意义的单词。
  • 纠正动词形式。
  • 交换动词短语。
  • 交换名词基数。
  • 交换不定式短语。
  • 单数化复数名词。
  • 链接组块变换。
  • 将组块树转换为文本。
  • 平展深度树。
  • 创建浅树。
  • 转换树标签。

既然你知道如何从一个句子中得到组块或短语,那么你能使用它们做些什么呢?本章将展示在组块和树上如何进行各种变换。组块变换是为了纠正语法错误,并重新安排短语,而不丧失意义。树转换为读者提供了修改和展平深度解析树的方式。本章详细介绍的函数修改数据,而不是从数据中学习。这意味着不加选择地应用它们是不安全的。对需要转换的数据有个透彻的了解,同时进行一些实验,有助于你确定何时应用何种函数。

在本章中,每次使用术语组块时,这指的是由组块器抽取出来的实际组块,或简单而言,这也可能指使用标注单词列表的形式表示的较短的短语或句子。在本章中,重要的是你使用组块能做些什么,而不是组块来自何处。

当涉及区别短语含义时,许多最常用的单词是没有意义的。例如,在短语the movie was terrible中,最有意义的单词是movie 和terrible,而the和was几乎是毫无用处的。如果去掉它们,你也可以得到相同的信息,即movie terrible或terrible movie。无论哪种方式,感情是一样的。本节讨论如何通过查看词性标签,删除无关紧要的单词,保留有意义的单词。

首先,需要决定哪些词性标签有意义,哪些没有意义。浏览treebank语料库,查看停用词,得到无意义的单词和标签,如下表所示。

单词

标签

a

DT

all

PDT

an

DT

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

Java持续交付

Java持续交付

Daniel Bryant, Abraham Marín-Pérez
C++语言导学(原书第2版)

C++语言导学(原书第2版)

本贾尼 斯特劳斯特鲁普
软件开发实践:项目驱动式的Java开发指南

软件开发实践:项目驱动式的Java开发指南

Raoul-Gabriel Urma, Richard Warburton
Spark机器学习实战

Spark机器学习实战

Posts & Telecom Press, Siamak Amirghodsi, Meenakshi Rajendran, Broderick Hall, Shuen Mei

Publisher Resources

ISBN: 9781835083451