Skip to Content
Python数据处理
book

Python数据处理

by Jacqueline Kazil, Katharine Jarmul
July 2017
Intermediate to advanced
398 pages
11h 54m
Chinese
Posts & Telecom Press
Content preview from Python数据处理
高级网页抓取:屏幕抓取器与爬虫
281
你可以同运行一个普通的抓取器一样,运行这个爬虫:
scrapy crawl package
你已经正式地完成了第一个爬虫!是否还有待完善的地方?有一个容易修复的
bug
遗留在
这段代码中了。你能找到它吗?如何修复它?[提示:查看你的
Python
版本,然后查看
返回版本的方式(即永远返回一个列表),与
grab_data
返回数据对比。]看看你是否能够
在爬虫脚本中修复这个问题。如果不能,可以参考本书仓库(
https://github.com/jackiekazil/
data-wrangling
),得到完整的修复后的代码。
Scrapy
是一个有效、快速、方便配置的工具。还有很多值得探索,你可以阅读该库的很棒
的文档(
http://doc.scrapy.org/en/latest/
)。配置你的脚本来使用数据库和特殊的信息抽取工
具,并且在自己的服务器上使用
Scrapyd
http://scrapyd.readthedocs.org/en/latest/
)运行它
们是很简单的。希望这是你之后众多
Scrapy
项目的第一个!
现在你理解了屏幕读取器、浏览器读取器和爬虫。让我们看看构建更加复杂的网页爬虫所
需要知道的其他一些事情。
12.3
 网络
互联网的工作原理
以及为什么它会
让脚本崩溃
取决于运行抓取脚本的频率,以及每个脚本工作的重要性,你可能会碰到网络问题。是
的,互联网正在尝试破坏你的脚本。为什么?因为互联网认为如果你真的在乎,你会重
试。在网页抓取世界里丢失的连接、代理问题以及超时问题普遍存在 ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

数据科学中的实用统计学(第2版)

数据科学中的实用统计学(第2版)

Peter Bruce, Andrew Bruce, Peter Gedeck
Java持续交付

Java持续交付

Daniel Bryant, Abraham Marín-Pérez
解密金融数据

解密金融数据

Justin Pauley

Publisher Resources

ISBN: 9787115459190