December 2018
Beginner to intermediate
684 pages
21h 9m
English
Scrapy is a powerful library to build bots that follow links, retrieve the content, and store the parsed result in a structured way. In combination with the headless browser splash, it can also interpret JavaScript and becomes an efficient alternative to Selenium. You can run the spider using the scrapy crawl opentable command in the 01_opentable directory where the results are logged to spider.log:
from opentable.items import OpentableItemfrom scrapy import Spiderfrom scrapy_splash import SplashRequestclass OpenTableSpider(Spider): name = 'opentable' start_urls = ['https://www.opentable.com/new-york-restaurant- listings'] def start_requests(self): for url in self.start_urls: yield SplashRequest(url=url ...