Chapter 5. Duplicate Content

We humans often find it frustrating to listen to people repeat themselves. Likewise, search engines are "frustrated" by web sites that do the same. This problem is called duplicate content, which is defined as web content that is either exactly duplicated or substantially similar to content located at different URLs. Duplicate content clearly does not contain anything original.

This is important to realize. Originality is an important factor in the human perception of value, and search engines factor such human sentiments into their algorithms. Seeing several pages of duplicated content would not please the user. Accordingly, search engines employ sophisticated algorithms that detect such content and filter it out from search engine results.

Indexing and processing duplicate content also wastes the storage and computation time of a search engine in the first place. Aaron Wall of http://www.seobook.com/ states that "if pages are too similar, then Google [or other search engines] may assume that they offer little value or are of poor content quality." A web site may not get spidered as often or as comprehensively as a result. And though it is an issue of contention in the search engine marketing community as to whether there is an explicit penalty applied by the various search engines, everyone agrees that duplicate content can be harmful.

Knowing this, it would be wise to eliminate as much duplicate content as possible from a web site. This chapter documents ...

Get Professional Search Engine Optimization with PHP: A Developer's Guide to SEO now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Professional Search Engine Optimization with PHP: A Developer's Guide to SEO by Jaimie Sirovich, Cristian Darie

Chapter 5. Duplicate Content

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly