May 2017
Beginner to intermediate
220 pages
5h 2m
English
Here is the completed version of our spider:
class CountrySpider(CrawlSpider): name = 'country' start_urls = ['http://example.webscraping.com/'] allowed_domains = ['example.webscraping.com'] rules = ( Rule(LinkExtractor(allow=r'/index/', deny=r'/user/'), follow=True), Rule(LinkExtractor(allow=r'/view/', deny=r'/user/'), callback='parse_item') ) def parse_item(self, response): item = CountryItem() name_css = 'tr#places_country__row td.w2p_fw::text' item['name'] = response.css(name_css).extract() pop_xpath = '//tr[@id="places_population__row"]/td[@class="w2p_fw"]/text()' item['population'] = response.xpath(pop_xpath).extract() return item
To save the results, we could define a Scrapy pipeline or set up an output ...
Read now
Unlock full access