February 2006
Intermediate to advanced
648 pages
14h 53m
English
The robotparser module provides a class that can be used to fetch and query information contained in the robots.txt files that websites use to instruct web crawlers and spiders. The contents of this file typically look like this:
# robots.txt User-agent: * Disallow: /warheads/designs # Don't allow robots here
RobotFileParser()Creates an object that can be used to read and query a single robots.txt file.
An instance, r, of RobotFileParser has the following attributes and methods:
r.set_url(url)
Sets the URL of the robots.txt file.
r.read()
Reads the robots.txt file and parses it.
r.parse(lines)
Parses a list of lines obtained from a robots.txt file. The resulting data is saved internally for use with other methods.
r.can_fetch( ...