February 2018
Beginner to intermediate
364 pages
10h 32m
English
Let's walk through the spider to see how this works. The spider starts with the following definition of the start URL:
class Spider(scrapy.Spider): name = 'spidyquotes' quotes_base_url = 'http://spidyquotes.herokuapp.com/api/quotes' start_urls = [quotes_base_url] download_delay = 1.5
The parse method then prints the response and also parses the JSON into the data variable:
def parse(self, response): print(response) data = json.loads(response.body)
Then it loops through all the items in the quotes element of the JSON objects. For each item, it yields a new Scrapy item back to the Scrapy engine:
for item in data.get('quotes', []): yield { 'text': item.get('text'), 'author': item.get('author', {}).get('name'), 'tags': item.get ...
Read now
Unlock full access