Skip to Content
Python和NLTK实现自然语言处理
book

Python和NLTK实现自然语言处理

by Posts & Telecom Press, Nitin Hardeniya
February 2024
Intermediate to advanced
649 pages
9h 58m
Chinese
Packt Publishing
Content preview from Python和NLTK实现自然语言处理

第3章 词语形态学——试一试

我们将词语形态学定义为使用词素(morpheme)研究单词组合。词素是具有意义的最小语言单位。本章将讨论词根还原和词形还原,非英语语言的词根还原器和词形还原器,使用机器学习工具、搜索引擎和许多此类的概念,开发词语形态分析器和词语形态生成器。

简而言之,本章包括以下主题。

  • 词语形态学。
  • 词根还原器。
  • 词形还原。
  • 开发用于非英语语言的词根还原器。
  • 词语形态分析器。
  • 词语形态生成器。
  • 搜索引擎。

我们将词语形态学定义为,在词素的帮助下,研究标记的生成。词素是承载意义的语言基本单位。有两种类型的词素:词根和词缀(后缀、前缀、中缀和环缀)。

由于词根可以在不添加词缀的情况下存在,因此词根也称为自由语素。由于词缀不能以自由形式存在,它们总是与自由语素一起存在,因此词缀也称为黏着语素。思考单词unbelievable。此处,believe是词根或自由语素,可以单独存在。词素un和able是词缀或黏着语素。虽然它们不能以自由形式存在,但是它们可以与词根一起存在。

有三种类型的语言,即孤立语、黏着语和屈折语。在所有这些语言中,词语形态学有不同的含义。孤立语是只有自由语素的那些语言,这些自由语素不携带任何时态(过去时,现在时和未来时)和数量(单数或复数)的信息。中文普通话是孤立语的一个示例。黏着语是将小单词结合在了一起,传达复合信息的那些语言。土耳其语是黏着语的一个示例。屈折语是将单词分解成较简单的单位的语言,但是所有较简单的单位表现出了不同的含义。拉丁语就是屈折语的一个示例。

形态变化的处理有以下几种:变形、派生、类词缀、组合形式和附缀化。变形指的是将单词转化为某种形式,这样它就可以表示人称、数量、时态、性别、名词所有格、动词的体和情绪。

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

Java持续交付

Java持续交付

Daniel Bryant, Abraham Marín-Pérez
C++语言导学(原书第2版)

C++语言导学(原书第2版)

本贾尼 斯特劳斯特鲁普
软件开发实践:项目驱动式的Java开发指南

软件开发实践:项目驱动式的Java开发指南

Raoul-Gabriel Urma, Richard Warburton
Spark机器学习实战

Spark机器学习实战

Posts & Telecom Press, Siamak Amirghodsi, Meenakshi Rajendran, Broderick Hall, Shuen Mei

Publisher Resources

ISBN: 9781835083451