After a lengthy introduction to Beautiful Soup and custom scrapers, it’s time to look at Scrapy: the website scraping tool for Python.
In my opinion, this is the only viable tool available currently for Python, which can handle complex scraping tasks out of the box. You can cache web pages, and add parallelism as you wish; you only need to configure Scrapy properly and write the extraction code.
In this chapter you will learn how to get the most out of Scrapy for the majority of your website scraping projects. You will write the Sainsbury’s extractor, configure Scrapy to create a ...