Skip to Content
Python文本分析
book

Python文本分析

by Jens Albrecht, Sidharth Ramachandran, Christian Winkler
August 2022
Intermediate to advanced
441 pages
11h 26m
Chinese
China Electric Power Press Ltd.
Content preview from Python文本分析
330
11
然后求出总和就能得出整体的情感。使用这项技术可以免去手动阅读每条评论并指
定情感标签的工作,我们可以依靠词典,因为它提供了专家给出的每个单词的情感
分数。在第一个案例中,我们使用刘兵词典,但是你可以自由扩展我们的方法,使
用其他词典。一般词典包含单词的几种变体,但不包含停用词,因此不需要进行标
准的预处理。只有出现在词典中的单词才会被实际打分。而这也导致了这种方法的
一个缺点,我们留到案例的最后再讨论。
11.4.1 Bing Liu lexicon
Bing Liu lexicon
(即刘兵词典)的编译方式是,将单词分成两组,一组是表达正面
情感的单词,一组是表达负面情感的单词。该词典还包含拼写错误的单词,更适合
分析从在线论坛、社交媒体以及其他此类来源中提取的文本,因此应该能够针对亚
马逊的客户评论数据给出良好的结果。
你可以通过作者的网站,下载刘兵词典的
zip
文件(
https://oreil.ly/A_O4Q
),其
中包含一组表达正面情感的单词和一组表达负面情感的单词。另外,
NLTK
库也提
供了这个词典作为语料库,可供我们下载后使用。在词典解压缩后,我们创建一个
dictionary
,用于保存词典中的单词及其相应的情感评分。下一步是为数据集中的每
一条评论生成评分。首先,我们将文本内容转换为小写;然后使用
NLTK
软件包中
word_tokenize
函数,将句子拆分成单词,并检查我们的
dictionary
中是否包含
这个单词。如果包含,则我们将单词相应的情感评分加到情感总分中。最后,我们 ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

精益AI

精益AI

Lomit Patel
构建知识图谱

构建知识图谱

Jesus Barrasa, Jim Webber
写给系统管理员的Python脚本编程指南

写给系统管理员的Python脚本编程指南

Posts & Telecom Press, Ganesh Sanjiv Naik

Publisher Resources

ISBN: 9787519864446