第8章　Scrapy

Scrapy是一个流行的网络爬虫框架，它使用了一些高级功能以简化网站抓取。本章中，我们将学习使用Scrapy抓取示例网站，目标任务与第2章相同。然后，我们还会介绍Portia，这是一个基于Scrapy的应用，允许用户通过点击界面抓取网站。

在本章中，我们将会介绍如下主题：

Scrapy入门；
创建爬虫；
对比不同的爬虫类型；
使用Scrapy进行爬取；
使用Portia编写可视化爬虫；
使用Scrapely实现自动化抓取。

8.1　安装Scrapy

我们可以使用pip命令安装Scrapy，如下所示。

pip install scrapy

由于Scrapy依赖一些外部库，因此如果在安装过程中遇到困难的话，可以从其官方网站上获取到更多信息，网址为http://doc.scrapy.org/en/latest/intro/install.html。

如果Scrapy安装成功，就可以在终端里执行scrapy命令了。

$ scrapy
    Scrapy 1.3.3 - no active project

Usage:
  scrapy <command> [options] [args]

Available commands:
        bench    Run quick benchmark test
        commands
        fetch    Fetch a URL using the Scrapy downloader
...

本章中我们将会使用如下几个命令。

startproject：创建一个新项目。
genspider：根据模板生成一个新爬虫。
crawl：执行爬虫。
shell：启动交互式抓取控制台。

　

要了解上述命令或其他命令的详细信息，可以参考http://doc.scrapy. ...

Get 用Python写网络爬虫（第2版） now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

用Python写网络爬虫（第2版） by Posts & Telecom Press, Katharine Jarmul

第8章　Scrapy

8.1　安装Scrapy

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly

第8章 Scrapy

8.1 安装Scrapy

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly

第8章　Scrapy

8.1　安装Scrapy