Skip to Main Content
SEO Warrior
book

SEO Warrior

by John I Jerkovic
November 2009
Beginner content levelBeginner
496 pages
13h 46m
English
O'Reilly Media, Inc.
Content preview from SEO Warrior

Dealing with Rogue Spiders

Not all crawlers will obey REP. Some rogue spiders will go to great lengths to pose as one of the big spiders. To deal with this sort of situation, we can utilize the fact that major search engines support reverse DNS crawler authentication.

Reverse DNS Crawler Authentication

Setup of reverse DNS crawler authentication is straightforward. Yahoo! discusses how to do it on its blogging site:

  1. For each page view request, check the user-agent and IP address. All requests from Yahoo! Search utilize a user-agent starting with ‘Yahoo! Slurp.’

  2. For each request from ‘Yahoo! Slurp’ user-agent, you can start with the IP address (i.e. 74.6.67.218) and use reverse DNS lookup to find out the registered name of the machine.

  3. Once you have the host name (in this case, lj612134.crawl.yahoo.net), you can then check if it really is coming from Yahoo! Search. The name of all Yahoo! Search crawlers will end with ‘crawl.yahoo.net,’ so if the name doesn’t end with this, you know it’s not really our crawler.

  4. Finally, you need to verify the name is accurate. In order to do this, you can use Forward DNS to see the IP address associated with the host name. This should match the IP address you used in Step 2. If it doesn’t, it means the name was fake.

As you can see, it is relatively easy to check for rogue spiders by using the reverse DNS approach. Here is the Yahoo! approach translated to PHP code:

<?php $ua = $_SERVER['HTTP_USER_AGENT']; $httpRC403 = "HTTP/1.0 403 Forbidden"; $slurp = 'slurp'; ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

The SEO Battlefield

The SEO Battlefield

Anne Ahola Ward

Publisher Resources

ISBN: 9780596804749Errata Page