Skip to Content
Python和NLTK实现自然语言处理
book

Python和NLTK实现自然语言处理

by Posts & Telecom Press, Nitin Hardeniya
February 2024
Intermediate to advanced
649 pages
9h 58m
Chinese
Packt Publishing
Content preview from Python和NLTK实现自然语言处理

第5章 解析——分析训练数据

解析(也称为句法分析),是自然语言处理(NLP)中的任务之一。将它定义为判断字符序列(使用自然语言编写的)是否符合使用正规语法定义的规则的过程。这是将单词分解成单词序列或短语序列并为它们提供特定组分类别(名词、动词、介词等)的过程。

本章包括以下主题。

  • 构建树库。
  • 从树库中提取上下文无关文法(Context-Free Grammar,CFG)的规则。
  • 从CFG中创建概率上下文无关的文法。
  • CYK图解析算法。
  • 厄雷图解析算法。

解析是NLP中涉及的一个步骤。将它定义为确定句子中单个组分的词性类别并分析给定句子是否符合语法规则的过程。术语解析(parsing)是从拉丁单词pars(叙述法)衍生得到的,意思为词性(part-of-speech)。

思考一个示例——Ram bought a book。虽然这句话语法正确,但是如果我们得到了句子Book bought a Ram,通过添加语义信息到这样构建的解析树中,那么我们可以得到一个结论,虽然句子在语法上是正确的,但是这句话在语义上不是正确的。因此,生成解析树后,随后也要为解析树添加意义。解析器是接受输入文本并构造解析树或语法树的软件。解析可分为两类:自上而下的解析和自下而上的解析。在自上而下的解析中,从起始符号开始,直到到达单个组分。一些自上而下的解析器包括递归下降解析器(Recursive Descent Parser)、LL解析器(LL Parser)和厄雷解析器(Earley Parser)。在自下而上的解析中,从单个组分开始,直到到达起始符号。一些自下而上的解析器包括运算符优先分析器(Operator-Precedence Parser)、简单优先级解析器、简单LR解析器、LALR解析器、规范LR(LR(1))解析器、GLR解析器、CYK(或CKY)解析器、递归上升解析器和移位归约解析器。 ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

Java持续交付

Java持续交付

Daniel Bryant, Abraham Marín-Pérez
C++语言导学(原书第2版)

C++语言导学(原书第2版)

本贾尼 斯特劳斯特鲁普
软件开发实践:项目驱动式的Java开发指南

软件开发实践:项目驱动式的Java开发指南

Raoul-Gabriel Urma, Richard Warburton
Spark机器学习实战

Spark机器学习实战

Posts & Telecom Press, Siamak Amirghodsi, Meenakshi Rajendran, Broderick Hall, Shuen Mei

Publisher Resources

ISBN: 9781835083451