Skip to Content
Python文本分析
book

Python文本分析

by Jens Albrecht, Sidharth Ramachandran, Christian Winkler
August 2022
Intermediate to advanced
441 pages
11h 26m
Chinese
China Electric Power Press Ltd.
Content preview from Python文本分析
66
2
2.4.5
通过流
API
提取数据
有些
API
提供了近乎实时的数据,我们称之为流数据(
streaming data
)。在这种情况下,
API
会把数据推送(
push
)给我们,而不是像前面那样等待我们发送获取(
get
)请
求。举个例子,
Twitter Streaming API
(推特流
API
)。这个
API
可以为我们提供实
时发送的推文样本,而且可以根据多个条件进行过滤。由于这是一个连续的数据流,
因此我们必须按照不同的方式处理数据的提取过程。
Tweepy
StreamListener
提供了基本的功能,该类有一个
on_data
函数,每当流
API
推送新推文时,就会调
用该函数,我们可以通过自定义这个函数,实现某些特定的逻辑。
下面,我们仍然以加密货币为例,假设我们希望持续更新人们对不同加密货币的看
法,以帮助我们制定交易决策。这时,我们需要跟踪涉及加密货币的实时推文,并
持续更新流行度。另一方面,作为研究人员,我们也比较关注超级碗或发布选举结
果之类的重大现场活动期间用户的反应。在这种情况下,我们需要留心事件的整个
过程,并将结果保存下来,以方便后续分析。
为了让我们的解决方案具有通用性,我们创建了
FileStreamListener
类,如下所
示,该类负责管理通过流传入推文时需要执行的所有动作。推特
API
每推送一条推
文,
on_data
方法都会被调用。在我们的实现中,传入的推文每
100
个分成一批,
与时间戳一起写入文件。你可以根据系统的内存大小自行设置批次的大小(这里为
100
):
from datetime ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

精益AI

精益AI

Lomit Patel
构建知识图谱

构建知识图谱

Jesus Barrasa, Jim Webber
写给系统管理员的Python脚本编程指南

写给系统管理员的Python脚本编程指南

Posts & Telecom Press, Ganesh Sanjiv Naik

Publisher Resources

ISBN: 9787519864446