More on robots.txt
Using robots.txt is the original way to tell crawlers what not to crawl. This method is particularly helpful when you do not want search engines to crawl certain portions or all portions of your website. Maybe your website is not ready to be browsed by the general public, or you simply have materials that are not appropriate for inclusion in the SERPs.
When you think of robots.txt,
it needs to be in the context of crawling and never in terms of indexing. Think of crawling as
rules for document access on your website. The use of the robots.txt standard is almost always applied
at the sitewide level, whereas the use of the robots HTML meta tag is limited to the page
level or lower. It is possible to use robots.txt for individual files, but you
should avoid this practice due to its associated additional maintenance
overhead.
Note
All web spiders do not interpret or support the robots.txt file in entirely the same way. Although the big three search engines have started to collaborate on the robots.txt standard, they still deviate in terms of how they support robots.txt.
Is robots.txt an absolute requirement for every website? In short, no; but the use of robots.txt is highly encouraged, as it can play a vital role in SEO issues such as content duplication.
Creation of robots.txt
Creating robots.txt is straightforward and can be done in any simple text editor. Once you’ve created the file, you should give it read permissions so that it is visible to the outside world. On ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access