Sometimes publishers produce great content and then for one reason or another fail to expose that content to search engines. In Content Delivery and Search Spider Control in Chapter 6, we discussed ways that you can hide content from the search engines when you want to. However, at times this is done unintentionally. Valuable content can be inadvertently hidden from the search engines, and occasionally, the engines can find hidden content and construe it as spam, whether that was your intent or not.
Identifying Content That Engines Don’t See
How do you determine when this is happening? Sometimes the
situation is readily apparent; for example, if you have a site that
receives high traffic volume and then your developer accidentally
NoIndexes every page on the site; you
will begin to see a catastrophic drop in traffic. Most likely it will
set off a panic investigation, leading to the
NoIndex issue as the culprit.
Does this really happen? Unfortunately, it does. Here is an
example scenario. You work on site updates on a staging server. Because
you don’t want the search engines to discover this duplicate version of
your site, you keep the pages on the staging server
NoIndexed. Then, when someone moves the site
from the staging server to the live server, he forgets to remove the
NoIndex tags. It is just normal human
error in action.
This type of problem can also emerge in another scenario. Some webmasters implement a robots.txt file that prohibits the crawling of their staging ...