Skip to Content
Python和NLTK实现自然语言处理
book

Python和NLTK实现自然语言处理

by Posts & Telecom Press, Nitin Hardeniya
February 2024
Intermediate to advanced
649 pages
9h 58m
Chinese
Packt Publishing
Content preview from Python和NLTK实现自然语言处理

第6章 文本分类

上一章谈论了一些最常见的NLP工具和预处理步骤。在本章中,我们将使用前面章节中学到的大部分知识,构建一个最成熟的NLP应用。我们将提供一个文本分类的通用方法,告诉读者如何使用寥寥几行代码,从头开始构建一个文本分类器。在文本分类的背景下,我们将提供一张包含了所有分类算法的备忘单。

本章会讨论一些最常见的文本分类算法,但是只简单介绍这些算法。如果读者希望了解其背后详细的数学思想,有众多可用的在线资源和书籍可供读者参考。我们将尽力提供读者所需知道的知识,让读者可以从一些工作代码片段开始。虽然文本分类是一个非常好的NLP用例,但是在本章中,我们不使用NLTK,而是使用拥有更广泛的分类算法的scikit-learn,它的代码库对于文本挖掘而言,更高效地使用了内存。

本章主要内容如下。

  • 所有文本分类算法。
  • 使用端到端管道构建文本分类器的方法,以及使用scikit-learn和NLTK实现文本分类器的方法。

以下是sciki-learn机器学习的备忘单。

..\18-1279 二校改图传回\1p6-1.tif

现在,当你沿着备忘单中所显示的流程前进时,我们明确指出了何种问题要求使用何种算法,以及根据标注样本的大小,我们应该何时从一个分类器移动到另一个分类器。遵循这张备忘单构建实际应用是一个良好的开端。在大部分情况下,这是行得通的。虽然scikit-learn也适用于其他类型的数据,但是我们主要集中在文本数据。本章将使用示例,探讨文本分类、文本聚类和文本(降维)的主题检测,构建一些比较酷炫的NLP应用。由于网络为你提供了大量的资源,因此本章会更加详细地探讨机器学习、分类和聚类的概念。在文本语料库的上下文中,我们将为读者提供所有这些概念的更多详细信息。当然,下面复习一些概念。 ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

Java持续交付

Java持续交付

Daniel Bryant, Abraham Marín-Pérez
C++语言导学(原书第2版)

C++语言导学(原书第2版)

本贾尼 斯特劳斯特鲁普
软件开发实践:项目驱动式的Java开发指南

软件开发实践:项目驱动式的Java开发指南

Raoul-Gabriel Urma, Richard Warburton
Spark机器学习实战

Spark机器学习实战

Posts & Telecom Press, Siamak Amirghodsi, Meenakshi Rajendran, Broderick Hall, Shuen Mei

Publisher Resources

ISBN: 9781835083451