Hidden Content

Sometimes publishers produce great content and then for one reason or another fail to expose that content to search engines. In Content Delivery and Search Spider Control in Chapter 6, we discussed ways that you can hide content from the search engines when you want to. However, at times this is done unintentionally. Valuable content can be inadvertently hidden from the search engines, and occasionally, the engines can find hidden content and construe it as spam, whether that was your intent or not.

Identifying Content That Engines Don’t See

How do you determine when this is happening? Sometimes the situation is readily apparent; for example, if you have a site that receives high traffic volume and then your developer accidentally NoIndexes every page on the site; you will begin to see a catastrophic drop in traffic. Most likely it will set off a panic investigation, leading to the NoIndex issue as the culprit.

Does this really happen? Unfortunately, it does. Here is an example scenario. You work on site updates on a staging server. Because you don’t want the search engines to discover this duplicate version of your site, you keep the pages on the staging server NoIndexed. Then, when someone moves the site from the staging server to the live server, he forgets to remove the NoIndex tags. It is just normal human error in action.

This type of problem can also emerge in another scenario. Some webmasters implement a robots.txt file that prohibits the crawling of their staging ...

Get The Art of SEO now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.