O'Reilly logo

Python: Essential Reference, Third Edition by David Beazley

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

robotparser

The robotparser module provides a class that can be used to fetch and query information contained in the robots.txt files that websites use to instruct web crawlers and spiders. The contents of this file typically look like this:

# robots.txt
User-agent: *
Disallow: /warheads/designs   # Don't allow robots here
						RobotFileParser()

Creates an object that can be used to read and query a single robots.txt file.

An instance, r, of RobotFileParser has the following attributes and methods:

						r.set_url(url)

Sets the URL of the robots.txt file.

						r.read()

Reads the robots.txt file and parses it.

						r.parse(lines)

Parses a list of lines obtained from a robots.txt file. The resulting data is saved internally for use with other methods.

						r.can_fetch( ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required