More and more site owners are concerned that they might be getting penalised accidentally or overtly because of duplicate content.
Do they have cause for worrying? Certainly, many in house or external SEOs have experienced duplicate content issues and 90% of them through natural means such as syndicating content or catalogue driven product pages that vary only in colour of product.
This session looks at various issues and explorers potential solutions.
Mikkel Svendsen and the Myths surrounding Duplicate Content
1. You don’t have to deal with duplicate pages as search engines do just fine
They do, however, you need to be aware as search engines can filter out important pages which can lead to loss of organic traffic.
2. Google will brutally punish you for duplicate content
Key factor is to understand the difference between punished and filtered.
The difference is crucial to how you fix it.
3. If it ain’t broke don’t fix it
People seem to think duplicate content is ok if you are not getting filtered or punished. However, Mikkel says it is like a landmine, people think it’s ok if you don’t step on it but sooner or later it will explode.
4. Duplicate Content is only problem across domains – not within your own domain
SE will try to filter out DP if it pollutes the index. When it makes doesn’t sense form a user point of view they will filter it.
Website should never be accessible on more than one domain – your brand domain (Canonical)
Make sure it is only accessed through one sub domain at a time
Password protect or implement a robots file as sometimes Mikkel says an old test site with exactly same content takes a while to filter out if you haven’t blocked search engines indexing the pages before launch of real site.
Http or https
Search engines do index both
5. Just implement the canonical tag and everything will be fine
Problems with canonical tag
– Works slower than 301
– Requires page to be crawled and indexed first – 301 don’t
– Doesn’t work perfectly yet
– There have been too many example of this
– Implemented perfectly
– You have to identify all duplicate content anyway
– Have to know where the problem is anyway so why don’t fix?
– Google parameter handlings
6. If you stay below certain % of text then you will be fine
A lot more than just a benchmark %
It is about context and link data (which I find an interesting theory)
The example Mikkel gives is if you have a piece that is featured in two papers then the context is taken into account. They both have different audiences and via analysing their link data search engines can see they both have very different circles they operate in so can determine that although the piece of content is very similar it warrants uniqueness.
Search engines also filter out boiler plate content (such as disclaimers and footer copy) and other content across many pages. They don’t always filter out entire page, just a ignore the duplicate content they find.
- Tracking – parameters of utm source code
- Session ids – in substitution of cookies
How you fix campaign and affiliate tracking URLs
Add this piece of code to your to GA tracking script:
Change ? after URL to # and this will stop search engines indexing URLs
Avoid duplicate content using RSS
- Never put entire post in your feed
- Use an abstract
- In WordPress to “use more” function
7. Virtually impossible to stop all internal
The catch all solution
Unfortunately I didn’t catch all of his slide as he was running out of time but I am sure if you twitter him @demib
Fantomaster didnt have a presentation but instead spoke to us freely about his thoughts on duplicate content. This was remarkably refreshing as was more like one of those talk shows “An audience with….” luckily, having seen Fantomaster speak many times before he is captivating and his years of experience shines through.
I thought it would be valuable if I picked some of the best quotes and most thought provoking ideas he touched on…
Looking from an empirical sense and a scientific manner – in regards to duplicate content we, SEOs, are just groping around in the dark.
We obviously get no specific details from search engine so we are just making educated and logical guesses.
Some forms of duplicate content occurs through the search engines use of stop words. If content is stripped of pronouns/prepositions then a lot of content might be similar.
If you don’t want stop words to be stripped out then put in inverted quotes or plus signs
Just because it reads different to you not for the same for search engines
If you use different tools you will get different results – % similarity of content can not be determined correctly so best thing to do is to make contenmt as unique as possible.
Gogole gave information on what they look for if you want to believe them -Shingles technology – phrases that are compared rather than singular words and synonyms and LSI.
Duplicate content often happens due to scaling.
Red shoes, black shoes, blue shoes – mainly for catalogue driven sites
Not pushing you into hell just not letting you into heaven
Switching key phrases in content does not work – not how search engines determine duplicate content
Consider adding below the fold pieces of copy on shoe history, shoes in ballet, etc… to make pages, that are more often than not very simialr, unique.
Put copy high up page in css/html structure but to appear below the fold
Vary product description
20% of 100,000 pages are orphaned – due to poor CMS or poor user knowledge
Automatically Generated Content is an option to vary content but 99% is illiterate – poor for conversion
Link building – another example of where duplicate content happens
Where people fail the most is in titles – language savy to succeed, synonyms won’t cut it. Think in terms of phrases rather than individual words
Ensure variance – objective is links not shakespeare prize for writing