Summary
A full understanding of Robots Exclusion Protocol is crucial. Note that REP is not fully supported in the same way by every search engine. The good news is that the most popular search engines are now working together to offer more uniform REP support. The benefits of this are obvious in terms of the work required to address the needs of different search engines.
Using robots.txt to block
crawlers from specific site areas is an important tactic in SEO. In most
cases, you should use robots.txt at
the directory (site) level. With the introduction of wildcards, you can
handle common SEO problems such as content duplication with relative
ease. Although the use of the Sitemap
directive is a welcome addition to robots.txt, Google is still encouraging
webmasters to add their Sitemaps manually by using the Google Webmaster
Tools platform.
Using HTML meta tags and their HTTP header equivalents is a way to specify indexing directives on the page level for HTML and non-HTML file resources. These types of directives are harder to maintain and you should use them only where required.
Not all web spiders honor REP. At times, it might be necessary to block their attempts to crawl your site. You can do this in many different ways, including via coding, server-side configuration, firewalls, and intrusion detection devices.