Duplicate Content – SEO Basics

Duplicate content – What is it?

Duplicate content is a term used by the search engines a lot.  You will hear SEO companies and consultants harping on about you having  duplicate content.  You’re probably asking why and why should you care?

Well if you want to rank well in Google/Bing/Yahoo,  you will need to care,as Search Engines hate it and there is a very good reason why they hate it.

Define Duplicate content?

Now like myself, I thought duplicate content is a piece of content which is on the internet (URL)  which is a carbon copy or an exact match in more than one place. Well this is partly true.  It took a while for me to research and work out what Search Engines class as “duplicate content“.

The reason why Search Engines don’t like duplicate content, is because it is difficult for the Search Engines to recognise which version is more current and relevant to a search query.  So the Search Engines will bypass the duplicate content and find something less complex.  So your website will receive a penalty which could effect your position and you possibly don’t know this is happening.

Three of the biggest issues with duplicate content include:

  1. Search Engines don’t know which version(s) to include/exclude from their indices.
  2. Search Engines don’t know whether to direct the link metrics (trustauthorityanchor text, link juice, etc.) to one page, or keep it separated between multiple versions.
  3. Search Engines don’t know which version(s) to rank for query results.

A few years ago one of the common practices were to build lots of microsites with the same content pointing back to your main site. These could be link wheels or just bog standard single page websites.  This resulted in the Search Engines getting a little fed up as this made it very complicated to separate the genuine content from the artificial content.

A lot of spam sites where created making the search results artificial, which meant the Search Engines were not providing the correct results for the search strings you wanted.

So some big updates come from the Search Engines and many sites which had duplicate content were dropped from the top spots automatically.  This happened to our site briefly, we did not have link wheels or mircosites but we did have a lot of content which appeared to be the same, so it does happen to the best of us.

Luckily there are some great tools out there which can help you track your duplicate pages such as Copy Scape.  Copy scape can instantly find your content anywhere on the web. It is very good to find anyone copying your content.

The question is, what do you do when you find duplicate content.

For example:

www.example.com/page.html
www.example.com/page-local.html.

This is where you need to establish which of the two pages can rank and give you the most traffic. First you need to look at the Page Rank value on Google  and which ever one scores highest you need to keep.  However the “page-local.html” page has a lower Page Rank but could have some important back links.  The best practice is not to delete the page but to create a permanent redirect, commonly known as a  “301 redirect.”   A 301 redirect tells the Search Engine that “page-local.html” no longer exists and it has moved permanently to “page.html”.

Mr search engine – http://www.example.com/page-local.html has permanently move to http://www.example.com/page.html.

So Google/Yahoo/Bing know that the content now on http://www.example.com/page-local.html has now moved to http://www.example.com/page.html and this is  the correct page to index and look at for the latest content.

SO you think you are finished on the duplicate content side of things…. Not quite! There is more and it gets a little complicated.  Believe it or not this was announced in 2009… so it’s been around for very long time.

However…Search Engines also treat the www. and the non www. as duplicate content.  For example:

  • www.example.com
  • example.com
  • www.example.com/
  • example.com/index.html
  • www.example.com/index.html

So above is how the Search Engines see your site in 4 or 5 different versions of the same content when you  type it in to a Search Engine.  However each example could have different content, and how do  Search Engines again know what is the correct information.  You could also have backlinks going to each of the 5 examples again. (Backlinks are a very important factor to SEO we’re not going to touch on this in this blog but we will soon.)   Like myself you think how the hell do you manage this and how do you rectify this?

In Google Web Masters  you can also set www. or non www.

The process to resolve this duplication is called “redirecting a canonical hostname”. This process also ties into your link building strategies as well.  This is another topic.

Like the page redirect you now need to redirect all non www to the www. pages both, at top level domain ,as well as through your site.  This can prove a very long winded process but if you’re running  your site on a linux web server then you can easily do this in one hit, by adding a condition to your htaccess file.

Below is an example we used on our web hosting platform for our website.

<IfModule mod_rewrite.c>
RewriteEngine on
RewriteCond %{HTTP_HOST} !^www\. [NC]
RewriteRule ^ http://www.%{HTTP_HOST}%{REQUEST_URI} [L,R=301]
</IfModule>
(PLEASE BE VERY CAREFUL – Please check with your web hosting technical support the correct method for this redirect).
This will tell the server for any user going  to example.com  to redirect them permanently to www.example.com.
The last port of call is to add to every page a Canonical link. The Search Engines like this canonical link as tag tells Bing and Google that the given page should be treated as though it were a copy of the URL www.example.com/page.html and that all of the links and content metrics the engines apply, should actually be credited toward the provided URL.
An example of the (element/tag or link) code.
<link rel=”canonical” href=”http://www.example.com/page.html”/>
Summary
You may never completely get away from duplicate content as often people link to your site. However if you link consistently, be mindful of not copying pages but use 301 redirects and add the Canonical function to your website.  For more information on Canonical Links please take a look at Google Webmasters Support.
If you would like to more information please get in contact via our website or you can follow us on Twitter.

This entry was posted in SEO Articles by Cocoonfxmedia. Bookmark the permalink.

About Cocoonfxmedia

I am an expert in Search engine optimisation (SEO), web design and development. I am always looking to see how to improve and develop ideas. Providing web design services around Tamworth,Lichfield,Sutton Coldfield and Birmingham

Comments are closed.