Skip to Content
Python和NLTK实现自然语言处理
book

Python和NLTK实现自然语言处理

by Posts & Telecom Press, Nitin Hardeniya
February 2024
Intermediate to advanced
649 pages
9h 58m
Chinese
Packt Publishing
Content preview from Python和NLTK实现自然语言处理

前言

NLTK是自然语言处理(Natural Language Processing,NLP)社区中最受欢迎和广泛使用的库之一。NLTK的优点在于其简单性,其中大多数复杂的NLP任务可以使用几行代码实现。本书主要内容包括:如何将文本标记为各个单词,如何使用WordNet语言词典,如何以及何时进行词干提取或者词形还原,如何替换单词和纠正拼写,如何创建自己的自定义文本语料库和语料库(包括MongoDB支持的语料库)读取器,如何使用词性标注器和部分词性标注单词,如何使用部分解析创建和转换分块短语树,如何进行文本分类的特征提取和情感分析,如何进行并行和分布式文本处理,以及如何在Redis中存储单词分布。

这种一边学习一边动手实践的学习方式会教你更多知识。本书有助于你成为使用NLTK进行自然语言处理的专家。

模块1讨论文本挖掘/NLP任务中所需的所有预处理步骤。该模块详细讨论标记化、词干提取、停用词删除和其他文本清理过程,以及如何在NLTK中轻松实现这些操作。

模块2解释如何使用语料库读取器和创建自定义语料库。它还介绍如何使用NLTK附带的一些语料库。它涵盖组块过程(也称为部分分析),组块过程可以识别句子中的短语和命名实体。它还解释如何训练自己的自定义组块器并创建特定的命名实体识别器。

模块3讨论如何计算单词频率和实现各种语言建模技术。它还讨论浅层语义分析(即NER)的概念和应用及使用Wordnet的TSD。

模块3有助于你理解和应用信息检索与文本摘要的概念。

在学习模块1时,需要满足的软硬件配置如下表所示。

章号

需要的软件

免费/专用

下载软件的网站

硬件规格

需要的操作系统

第1~5章

Python/Anaconda ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

Java持续交付

Java持续交付

Daniel Bryant, Abraham Marín-Pérez
C++语言导学(原书第2版)

C++语言导学(原书第2版)

本贾尼 斯特劳斯特鲁普
软件开发实践:项目驱动式的Java开发指南

软件开发实践:项目驱动式的Java开发指南

Raoul-Gabriel Urma, Richard Warburton
Spark机器学习实战

Spark机器学习实战

Posts & Telecom Press, Siamak Amirghodsi, Meenakshi Rajendran, Broderick Hall, Shuen Mei

Publisher Resources

ISBN: 9781835083451