January 2020
Intermediate to advanced
640 pages
16h 56m
English
All links that survive the link filter are consumed by the link fetcher component. As its name implies, this component is responsible for establishing an HTTP connection to each link target and retrieving any content returned by the server at the other end.
The fetcher meticulously processes the HTTP status code and any HTTP headers returned by remote servers. If the returned status code indicates that the content has been moved to a different location (that is, 301 or 302), the fetcher will automatically follow redirects until it reaches the content's final destination. It stands to reason that we would not want our fetcher to get stuck in an infinite redirect loop trying to crawl an incorrectly configured (or malicious) ...