O'Reilly logo

NLTK Essentials by Nitin Hardeniya

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Data flow in Scrapy

The data flow in Scrapy is controlled by the execution engine and goes like this:

  1. The process starts with locating the chosen spider and opening the first URL from the list of start_urls.
  2. The first URL is then scheduled as a request in a scheduler. This is more of an internal to Scrapy.
  3. The Scrapy engine then looks for the next set of URLs to crawl.
  4. The scheduler then sends the next URLs to the engine and the engine then forwards it to the downloader using the downloaded middleware. These middlewares are where we place different proxies and user-agent settings.
  5. The downloader downloads the response from the page and passes it to the spider, where the parse method selects specific elements from the response.
  6. Then, the spider sends ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required