November 2017
Intermediate to advanced
226 pages
5h 59m
English
Let's build a simple link extractor with Scrapy:
In the new spider file, import the required modules:
import scrapy from scrapy.linkextractor import LinkExtractor from scrapy.spiders import Rule, CrawlSpider
class HomeSpider2(CrawlSpider):
name = 'home2'
allowed_domains = ['books.toscrape.com']
start_urls = ['http://books.toscrape.com/']
rules = [
Rule(
LinkExtractor(
canonicalize=True,
unique=True
),
follow=True,
callback="parse_page"
)
]
This rule orders the extraction of all unique and canonicalized links, and ...
Read now
Unlock full access