Skip to Content
NLTK应用开发指南
book

NLTK应用开发指南

by Posts & Telecom Press, Nitin K Hardeniya
May 2024
Intermediate to advanced
172 pages
2h 39m
Chinese
Packt Publishing
Content preview from NLTK应用开发指南

第4章 文本结构解析

本章的内容将会让我们对文本的深层结构有一个更好的理解,并掌握解析文本的具体方法,以及如何在不同的NLP应用中使用它这些方法。如你所知,我们目前已经完成了NLP中的各种预处理步骤,接下来就该进入到一些更深层次的文本处理了。语言的结构是非常复杂的,需要按照其结构处理的各层次来对它进行描述。本章将讲解所有的文本结构,介绍这些结构之间的区别,并详细介绍其中部分结构的具体用法。另外,还将讨论上下文无关语法(context-free grammar简称CFG),以及它在NLTK库中的具体实现。还会带你浏览各种不同的文本解析器,并介绍如何使用NLTK库中现有的一些解析方法。具体而言,会用NLTK库来写一个浅解析器,其中将会再次讨论到语块分解语境中的NER问题。也会详细地为你介绍NLTK库中现有的一些可用于深层文本结构分析的选项。我们会试着为你提供一些关于信息提取的真实用例,以便介绍本章提及的这些话题所发挥的具体作用。总而言之,我们希望读者在阅读完本章之后能对这些话题有一定程度的理解。

本章将会介绍以下内容。

  • 首先,会介绍文本解析是什么,以及与NLP相关的文本解析究竟是怎样的。
  • 然后,会讨论各种不同的文本解析器,以及如何用NLTK库来执行解析。
  • 最后,还将讨论文本解析在信息提取操作中的作用。

通常情况下,在深入解析或者全面解析的过程中,像CFG、PCFG(即probabilistic context-free grammar,概率性上下文无关语法)以及搜索策略这样的语法概念的作用都是要将一套完整的语法结构应用到某个句子上。其中,浅解析(shallow parsing)是一种面向给定文本的,对其语法信息部分所进行的有限解析任务。而深解析(deep ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

Python编程入门与实战

Python编程入门与实战

Posts & Telecom Press, Fabrizio Romano
高性能Spark

高性能Spark

Holden Karau, Rachel Warren
Java数据科学指南

Java数据科学指南

Posts & Telecom Press, Rushdi Shams
Python机器学习案例精解

Python机器学习案例精解

Posts & Telecom Press, Yuxi (Hayden) Liu

Publisher Resources

ISBN: 9781836205913