Skip to Content
Python文本分析
book

Python文本分析

by Jens Albrecht, Sidharth Ramachandran, Christian Winkler
August 2022
Intermediate to advanced
441 pages
11h 26m
Chinese
China Electric Power Press Ltd.
Content preview from Python文本分析
网页抓取与数据提取
83
# get the part after the last / in URL and use as filename
file = url.split("/")[-1]
r = s.get(url)
if r.ok:
with open(file, "w+b") as f:
f.write(r.text.encode('utf-8'))
else:
print("error with URL %s" % url)
输出结果:
CPU times: user 117 ms, sys: 7.71 ms, total: 124 ms
Wall time: 314 ms
根据互联网的连接状况,上述代码可能需要更长的时间才能运行完成,但是应该很
快。我们利用会话抽象实现了
HTTP
持续连接、
SSL
会话缓存等,以最大化运行速度。
下载
URL
时使用适当的错误处理
在下载
URL
的时候,你需要使用网络协议与远程服务器通信。这期间
可能会发生各种各样的错误,比如
URL
变化,服务器不响应等。上述
示例仅展示了一个错误信息,然而,在现实世界中,你的解决方案应该
会更复杂。
3.10
案例:利用
wget
下载
HTML
页面
当需要大量下载页面时,
wget
https://oreil.ly/wget
)是一个非常好的工具,
这是一款几乎适用所有平台的常用命令行工具。
Linux
macOS
都应该
安装了
wget
,你也可以通过包管理器轻松完成安装。对于
Windows ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

精益AI

精益AI

Lomit Patel
构建知识图谱

构建知识图谱

Jesus Barrasa, Jim Webber
写给系统管理员的Python脚本编程指南

写给系统管理员的Python脚本编程指南

Posts & Telecom Press, Ganesh Sanjiv Naik

Publisher Resources

ISBN: 9787519864446