Skip to Content
Python: Essential Reference, Third Edition
book

Python: Essential Reference, Third Edition

by David Beazley
February 2006
Intermediate to advanced content levelIntermediate to advanced
648 pages
14h 53m
English
Sams
Content preview from Python: Essential Reference, Third Edition

robotparser

The robotparser module provides a class that can be used to fetch and query information contained in the robots.txt files that websites use to instruct web crawlers and spiders. The contents of this file typically look like this:

# robots.txt
User-agent: *
Disallow: /warheads/designs   # Don't allow robots here
						RobotFileParser()

Creates an object that can be used to read and query a single robots.txt file.

An instance, r, of RobotFileParser has the following attributes and methods:

						r.set_url(url)

Sets the URL of the robots.txt file.

						r.read()

Reads the robots.txt file and parses it.

						r.parse(lines)

Parses a list of lines obtained from a robots.txt file. The resulting data is saved internally for use with other methods.

						r.can_fetch( ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Python: Essential Reference

Python: Essential Reference

David M. Beazley

Publisher Resources

ISBN: 0672328623Purchase book