Don't Get Duped by Duplicate Content

Definition
Duplicate content is exactly what it sounds like – two or more occurrences of identical or nearly identical content appearing on the web. To give you a rough estimate, if 60% of the content of a page in question is the same as another, the search engines will most likely consider them duplicates of each other.

Why It’s a Problem
For search engines, a satisfactory customer experience is their main goal. Duplicate content detracts from the customer experience by creating a results list filled with the same content over and over again—not a good way to build searcher loyalty. Duplicate content also takes up valuable bandwidth. To fix the problem, search engines employ all kinds of tactics throughout the indexing process that filter out duplicates and only rank what they consider as the “original” source of the content.

For websites, this is a problem because the search engines’ de-duping process is by no means perfect. Which means that another instance of your content may be ranked instead of the one housed on your site, possibly translating into a loss of traffic that is rightly yours.

Common Causes of Duplicate Content
There are several ways that multiple versions of the same content can be created:

Multiple URLs pointing to the same content. For example: Domain aliases (additional domain names that point to the same content); Mirrored sites (an exact copy of your site – an SEO no-no); A change in URL structure (for example, a site redesign that leads to a change in URLs from www.yoursite.com/about to www.yoursite.com/?about)
Similar content on different pages. For example: two pages talking about the same product.
Printer-friendly versions of content (URL is different, but content is the same except for the removal of navigation and graphics)
Content syndication (articles, rss feeds) that results in your content being on other sites
Canonicalization (http://www.yoursite.com vs. http://yoursite.com)
Session IDs
Scraper sites (sites that copy content from other sites, usually for the purpose of creating MFA (“Made for AdSense”) sites that are designed solely to make a profit via click generation )

Ways to Avoid Duplicate Content
While not an exhaustive list, below are some best practices to follow to help avoid creating duplicate content and confusing the search engines:

Use 301 redirects whenever possible to point domains and URLs to one source of content
Use a robots.txt “no follow” protocol to tell spiders not to crawl duplicate versions of content
Use cookies for session IDs rather than unique URLs
Reevaluate similar pages – do you really need both or can they be combined?
Be the authoritative source of your syndicated content – ask that absolute links back to you are included in any feeds or articles

Duplication Nation
While everything these days seems to be about copying and redistributing the latest thing ad nauseum to make a quick buck (i.e. reality shows, boy bands, low carb diets, etc.) the same does not hold true in the search engine world. The only way to benefit here is to claim sole ownership of your content and make it very clear to Google, Yahoo and the others that yours is the only one that should be shown in the results.

Thursday, February 01, 2007