Skip to Main Content
Linux Server Security, Second Edition
book

Linux Server Security, Second Edition

by Michael D. Bauer
January 2005
Intermediate to advanced content levelIntermediate to advanced
544 pages
23h 44m
English
O'Reilly Media, Inc.
Content preview from Linux Server Security, Second Edition
This is the Title of the Book, eMatter Edition
Copyright © 2007 O’Reilly & Associates, Inc. All rights reserved.
Web Content
|
327
Robots and Spiders
Some hits to your web site will come from programs called robots. Some of these
gather data for search engines and are also called spiders. A well-behaved robot is
supposed to read and obey the robots.txt file in your site’s home directory. This file
tells it which files and directories may be searched. You should have a robots.txt file
in the top directory of each web site. Exclude all directories with CGI scripts (any-
thing marked as ScriptAlias, such as /cgi-bin), images, access-controlled content, or
any other content that should not be exposed to the world. Here’s a simple example:
User-agent: *
Disallow: /image_dir
Disallow: /cgi-bin
Many robots are spiders, used by web search engines to help catalogue the Web’s
vast expanses. Good ones obey the robots.txt rules and have other indexing heuris-
tics. They try to examine only static content and ignore things that look like CGI
scripts (such as URLs containing ? or /cgi-bin). Web scripts can use the
PATH_INFO
environment variable and Apache rewriting rules to make CGI scripts search-engine
friendly.
The robot exclusion standard is documented at http://www.robotstxt.org/wc/noro-
bots.html and http://www.robotstxt.org/wc/robots.html.
Rude robots can be excluded with environment variables and access control: ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Linux: Powerful Server Administration

Linux: Powerful Server Administration

Uday Sawant, Oliver Pelz, Jonathan Hobson, William Leemans
Linux Server Hacks

Linux Server Hacks

Rob Flickenger
Linux Server Hacks, Volume Two

Linux Server Hacks, Volume Two

William von Hagen, Brian K. Jones

Publisher Resources

ISBN: 0596006705Supplemental ContentCatalog PageErrata