Clicky

X

Subscribe to our newsletter

Get the State of Digital Newsletter
Join an elite group of marketers receiving the best content in their mailbox
* = required field
Daily Updates

Hreflang and canonical international SEO test

8 March 2013 BY

76 Flares Twitter 13 Facebook 4 Google+ 59 LinkedIn 0 Buffer 0 Email -- StumbleUpon 0 Pin It Share 0 Filament.io 76 Flares ×

This is a guest post by Grosen Fris, SEO at OnlinePartners in Denmark

Google’s hreflang option for international SEO has been available for more than a year now, so we decided it was time to conduct a clinical SEO test to see if it works as promised.

In addition to testing if Google’s hreflang option has an effect on how your web sites’ performs in Google’s country-specific indexes, like e.g. Google.co.uk and Google.dk, we also tested whether hreflang can be combined with canonical in case you have problems with duplicate content on your web sites.

Why test the combination of hreflang and canonical?

Hreflang is very interesting for web sites that have e.g. more or less identical english content spread across different sub domains or country code top-level domains (ccTLD) – e.g. mydomain.co.uk for UK and mydomain.ie for Ireland.

You may get the following advantages, when you e.g. have several web shops each targeting a specific country, despite the fact that their content is almost 100% identical and thus have major problems with duplicate content.

  1. You can get a better country-specific representation in Google’s search results, which many users no doubt appreciate. E.g. you get mydomain.co.uk to appear in search results in Google.co.uk instead of mydomain.com
  2. You also let Google help you send the user to the most relevant web shop and this way you increase the likelihood that the user immediately sees the most relevant currency and price. You also let Google help you send the user to the web shop from where delivery is possible. Imagine you have a webshop on mydomain.com and mydomain.co.uk and let’s assume that it is mydomain.com that appears in Google’s Google.co.uk search results. This would send the user to a web shop that might show the user the wrong currency and price, and perhaps shipment to the UK is not possible from mydomain.com. Here you might need special features on each web shop that tries to detect where in the world the user is located based on e.g. his/her IP and redirect him/her from e.g. mydomain.com to mydomain.co.uk

We also wanted to test hreflang in combination with canonical because Google on the one hand states that you should do so if you have problems with duplicate content, on the other hand we have also spoken with many SEO’s who were not sure about this.
However, it does make sense to be able to combine hreflang and canonical.

  • If you have domains with unique content targeted different countries, then you do not need canonical. Here you only need hreflang that gives you the opportunity to tell Google how all your various domains are linked together across many countries.
  • If you on the other hand have identical content in the same language across multiple domains targeted different countries where they speak the same language, then it makes perfect sense to combine hreflang and canonical.

Test conducted on .com domain and related sub domains

We have used the following (sub)domains to conduct this test, and we encourage all to take a look at how they are set up.

Country Language (Sub)domain
Not selected English http://href-lang.com
Australia English http://au.href-lang.com
United Kingdom English http://uk.href-lang.com
Ireland English http://ie.href-lang.com

Structure of a test web site:

When you look at a single test site, none of the pages have duplicate content, this is ensured due to the use of gibberish english – i.e. english words automatically and randomly selected for each page. However if you compare each test web site you will see that they are 100% identical across the four test (sub)domains..
Each test web site is set up as follows.

  • 5 levels:
    • Home page
    • Below the home page there are 3 levels and each has 9 sub-pages
    • 5th and lowest level consists of link-out-pages
  • The test web sites reside on 1 main .com domain and 3 related sub domains
  • Hosted on an IP address related to Denmark (77.66.30.208) Test yourself via ipligence.com/geolocation
  • The only link building made for the test web sites are from web sites related to Denmark
  • We deliberately chose to use sub domains instead of ccTLD’s as ccTLD’s themselves give Google a strong signal of target country and language, that is not the case for a .com domain and related sub domains
  • Since the site: command seems to be phased out by Google, it does not give you a good overview of the indexing of the test web sites, so we decided to submit all 4 test web sites to the same Google Webmaster Tools (GWT) account. We did not use GWT to “cheat” by setting a target country for each test web site inside GWT :) We only used GWT to monitor the indexing of each test web site.

Structure of and content on a page

Each page contains the following:

  • Titel
  • Meta description
  • Hreflang og canonical
  • Breadcrumb
  • Main headline wrapped in <h1> tag
  • Sub headline wrapped in <h2> tag
  • 1-3 paragraphs wrapped in <p> tag
  • Navigation and outgoing links

Configuration of hreflang and canonical on a page

The configuration of hreflang and canonical on a page is as follows

Country Language (Sub)domain hreflang canonical
Not selected English http://href-lang.com en Points to http://href-lang.com
Australia English http://au.href-lang.com en-au Points to http://href-lang.com
United Kingdom English http://uk.href-lang.com en-gb Points to http://href-lang.com
Ireland English http://ie.href-lang.com en-ie Points to http://href-lang.com

Example:

<link rel="alternate" hreflang="en" href="http://href-lang.com/chordospartium-pane.html" />
<link rel="alternate" hreflang="en-ie" href="http://ie.href-lang.com/chordospartium-pane.html" />
<link rel="alternate" hreflang="en-au" href="http://au.href-lang.com/chordospartium-pane.html" />
<link rel="alternate" hreflang="en-gb" href="http://uk.href-lang.com/chordospartium-pane.html" />
<link rel="canonical" href="http://href-lang.com/chordospartium-pane.html" />

 

Here you can see the complete setup of a page – click on image to enlarge (original here)

test-page-set-up

Google indexing from start until now

We conducted site: searches in Google and we watched the indexing in GWT.

Initially, both the main domain and the sub domains where indexed in Google, but when the sub domains reached up to approx. 80-110 pages being indexed, the indexing stopped and began to roll back. I assume it is because Google’s bot first crawls the pages on the test web sites, and then later another routine is doing analysis of other elements such as hreflang and canonical. Thus Google’s search results do not immediately reflect the use of hreflang and canonical. At this moment where I write this blog post GWT states that is has reviewed approx. 870 of the 901 pages on each sub domain and that there are only approx. 16-31 pages on each sub domain that are still indexed in Google, however we expect that to be fully adjusted in the near future. All in all what we saw in GWT related to the indexing of the 3 sub domains were as we expected.

Unfortunately the two screen dumps below are in danish as it was not possible for me to change the GWT interface from danish to english.

  • Blue: Total pages indexed
  • Red: Total pages reviewed
  • Yellow: Total pages blocked from being indexed (e.g. via robots.txt)
  • Purple: Total pages removed

Click on image to enlarge (original here)

gwe-uk-href-lang-com-600x376
However, the indexing of the main domain was a bit of a surprise, the reason is that due to the use of hreflang and canonical it seems as if GWT perceived the 4 test web sites as one single web site. The 4 test web sites consists of 4 x 901 pages = 3,604 pages, and as this blog post is being written GWT states that 4,409 pages have been crawled and reviewed. That is 800 pages more than actually exists on the 4 test web sites and I have no immediate idea why GWT is so inaccurate on this specific number?

Click on image to enlarge (original here)
gwe-href-lang-com-600x379

Below is a list of how many pages Google so far has reviewed for each test web site, the maximum number of pages that have been indexed and how many pages is currently indexed in Google.

Country (Sub)domain Number of pages
reviewed
Number of pages
indexed
(maximum)
Number of pages
indexed
(for the moment)
Not selected http://href-lang.com 4,409 1,129 895
Australia http://au.href-lang.com 871 110 34
United Kingdom http://uk.href-lang.com 870 83 16
Ireland http://ie.href-lang.com 869 74 19

Test results

We have conducted tests in Google’s country-specific indexes via both real people and tools:

  • Manual tests carried out by kind people in the SEO industry who are based on relevant geo-IP’s (Australia, United Kingdom and USA)
  • Via manual tests through VPN / proxy that is based is a relevant country (Canada)
  • Impersonal.me
  • Software that measures the positions of a (sub)domain on selected keywords in specific Google country-indexes

The following search phrases were tested in Google’s different country-specific indexes. Please try for yourself by copy/paste the search phrases from the fields below and try them in Google (consider including the double quotation marks as this makes a test search in Google more accurate).

Level at
test web site
Search phrase
1.
1.5
1.6.5
1.7.1.8

All test search phrases showed the expected (sub)domains in Googles search results:

Geo-IP (Sub)domain in SERPs
USA http://href-lang.com
Canada http://href-lang.com
Australia http://au.href-lang.com
United Kingdom http://uk.href-lang.com
Ireland http://ie.href-lang.com
Denmark http://href-lang.com

Conclusion

  • Can you use Google hreflang to international SEO? Yes
  • If you have problems with duplicate content, should you then combine hreflang with canonical? Yes
  • If you do NOT have problems with duplicate content, should you then also combine hreflang with canonical? No

Finally I should like to say that earlier it was not a good idea to let the three sub domains or equivalent ccTLD’s be indexed in Google, because of the problems with double content. At the same time it would be almost impossible to get other than the main .com domain to appear in all search results, even when searching in Google’s country-specific indexes. But thanks to hreflang and canonical, this is now possible.

Please beware that we also present the results from this test in this YouTube video

AUTHORED BY:
h

This post was written by an author who is not a regular contributor to State of Digital. See all the other regular State of Digital authors here. Opinions expressed in the article are those of the contributor and not necessarily those of State of Digital.
  • GerardGallegos

    Hi! Nice study. I tried by myself and it is definitely giving me the right sub-domain for each country.

    I’m currently facing the same situation. Or similar, in our case we are working on ccTLD and we will also need to implement canonicals.

    I’m assuming hreflang points the base language but the canonical kind of replace that “duplicate” to the main domain again. How were the canonical on the subdirectory pages?

    • http://www.onlinepartners.dk/blog Grosen Friis

      Hi GerardGallegos

      >> “I’m assuming hreflang points the base language but the canonical kind of replace that “duplicate” to the main domain again.”

      Correct, but hreflang only pointed the langugage on the main domain whereas it pointed both language + country on the subdomains

      >> “How were the canonical on the subdirectory pages?”

      Each single page on the subdomains (or could be ccTLDs) points to the exact same page on the main domain via canonical. So if you have e.g. these pages.

      – mydomain.com/lamps/my-new-ceiling-lamp.html
      – au.mydomain.com/lamps/my-new-ceiling-lamp.html
      – ie.mydomain.com/lamps/my-new-ceiling-lamp.html
      – uk.mydomain.com/lamps/my-new-ceiling-lamp.html

      Then all four (sub)domains points to all four at the same time via hreflang telling Google what language and country (if applicable) each (sub)domain relate to. So if Googlebot visits

      – au.mydomain.com/lamps/my-new-ceiling-lamp.html

      then it would find four lines of hreflang pointing out language and country settings for all four (sub)domains. Same thing if Googlebot visits one of the three other (sub)domains.

      In addition, the page

      …/lamps/my-new-ceiling-lamp.html

      points to the same page via both canonical and hreflang, they do – not – point to the root of each domain.

      I suggest you go to href-lang.com and look at the home page and on some of the sub pages and take a look at how both hreflang and canonical are configured on each page in the HTML code (head section).

      /Grosen Friis

      • GerardGallegos

        That is really useful. Thank you very much.

        Definitely hreflang is going to solve a lot of problems.

        thanks again

        • http://www.onlinepartners.dk/blog Grosen Friis

          Hi GerardGallegos

          You are welcome and I agree – hreflang is going to solve a lot of language/country problems

          /Grosen

  • http://twitter.com/steviephil Steve Morgan

    Great article and great research, Grosen!

    For a while now, I’ve been supporting a client of mine with hreflang implementation. The canonical + hreflang side of things has had me a little concerned, especially given the concerns of other SEOs and the fact that Google changed its stance on it.

    The client’s situation is particularly tricky – in addition to multiple English language sites for different countries, they also have to have different two versions of the site, within each country: one for professionals and one for non-professionals. Unfortunately it’s a necessity (legal/compliance reasons) and so I’ve been wondering about how to handle this from a duplicate content standpoint. A fellow SEO told me that I could canonicalise each section to itself (e.g. .com/uk/prof canonicalising itself and .com/uk/nonprof also canonicalising itself, etc.), but I wasn’t sure this was even do-able. How would you go about it from a canonical standpoint? I’d appreciate your thoughts (and those of any other readers/commenters, too).

    I also found it interesting that hreflang worked as intended without geotargeting configuration (which I think you mentioned in your supplementary YouTube video). I think that’s a major find in itself, given that there are those who believe geotargeting is a necessity in order for hreflang to work properly.

    Keep up the good work! :-)

    • http://www.onlinepartners.dk/blog Grosen Friis

      Hi Steve Morgan

      In this situation you have two websites targeting two different users i.e. professionals and non-professionals but within the same country or within the same language. Now getting both websites indexed is harder because you do not have hreflang in this situation to help you diversify e.g. the non-professionals from the professionals website. So in my opinion you have the following options:

      1. the hard job of rewriting the content on the e.g. non-professionals so it is different from the professionals.

      2. if the content on each page is limited, e.g. like 100-200 words describing a product or similar, then you could consider adding a unique ‘About us’ spinner text for each page on the e.g. non-professionals website. This new content could be placed behind a ‘Read more’ link. This way the existing 100-200 words no longer consists of 100% of the unique content on the page, but e.g.40-50%. In addition you would need to rewrite all titles and meta description. Quality spinner texts are i my opinion a great way to fight both thin and duplicate content. A quality spinner text of 400 words takes approx. 10 hours to write and proof-read, but after that you can spin thousands of unique versions from it

      3. Set up canonical on e.g. the non-professionals website that points to the professionals website and making sure each page on non-professionals points to the exact same page on professionals. In addition I would ad both websites to the same Google Webmaster Tool account. Doing that you tell Google how the two websites are related. Now here you would never get problems with duplicate content, but you would not get the non-professionals websites indexed at all. So here you might need to have a clear call-to-action in the design template helping the professionals to stay on the professionals website when they get to it via Google SERPs and help the non-professionals to go to the non-professionals websites once they have found the professionals website via Google SERPs.

      >> “I also found it interesting that hreflang worked as intended without geotargeting configuration (which I think you mentioned in your supplementary YouTube video). I think that’s a major find in itself”

      Yes that is correct and I agree it is great that some content in the same language can be targeted specific countries whereas other content also in the same language can be targeted the language only. And you can to that both with unique and duplicate content.

      /Grosen Friis

  • Pingback: Hreflang and canonical international SEO test | E-Commerce managers .com()

  • Pingback: Il meglio dell SEOttimana (11-17 Marzo) - PR - Popularity Reference()

  • Pingback: Top 31 SEO Experiment​s You Want To Know About | Search Engine Journal()

  • Pingback: Top 31 SEO Experiment​s You Want To Know About by @tuknov | Pay4Rank()

  • Pingback: Top 31 SEO Experiment​s You Want To Know About by @tuknov | jhWebWorks | Columbus Ohio Web Design, Development, SEO, Social Media()

  • Pingback: Атрибут hreflang для указания языка альтернативной страницы: инструменты, наблюдения, примеры | Вёрстка, интернет, SEO, WordPress, CSS, HTML5, маркетинг()

  • Pingback: Read()

  • Pingback: click through the next post()

  • Pingback: Uso correto da tag hreflang: uma nova ferramenta para auxiliar na criação | iMasters()

  • http://www.apptechdesigner.com/blog Marco Russo

    Hi Grosen,

    href=lang only works on .com?

    Because, I have a domain.de, same content for domain.at, domain.ch (in german version)
    And I have domain.fr same content for domain.lu, domain.be and domain.ch (in french version)

    And I have .com (english) for all other countries spoke english

    How did I add per each webpage and per each content those lines?
    Thanks
    Marco

  • Pingback: Getting hreflang Right: Examples and Insights for International SEO | Atlanta Web Traffic()

  • Pingback: Getting hreflang Right: Examples and Insights for International SEO | North Carolina SEO and Web Design()

  • Pingback: Getting hreflang Right: Examples and Insights for International SEO | Search Engine Opitmization Lab - SEO Company, Delhi, India()

  • Pingback: Getting hreflang Right: Examples and Insights for International SEO - Webular Technologies()

  • Pingback: Getting hreflang Right: Examples and Insights for International SEO | BookSocial.net()

  • Pingback: Getting hreflang Right: Examples and Insights for International SEO | MarketingTumbler.com()

  • Pingback: Getting hreflang Right: Examples and Insights for International SEO | Xero Media Services()

  • Pingback: Getting hreflang Right: Examples and Insights for International SEO | Xero Media Services()

  • Pingback: Getting hreflang Right: Examples and Insights for International SEO | Internet Marketing()

  • Pingback: Getting hreflang Proper: Examples and Insights for Global search engine optimization | Daily News()

  • http://www.andredittmar.de Andre Dittmar

    Hello,
    Great presentation (I saw the YouTube video) and test cases which really explain the usage of href-lang in combo with canonical!

    Nevertheless I have a question:

    In
    3:10 you mentioned countrycode TLDs. We sometimes use them for
    multi-country websites, so the question is if hreflang is still required
    then, since Google already know then that a (identical) german content
    for example.at is for Austria, example.de for Germany and example.ch for
    Switzerland?

    I also assume that canonical urls are only required
    if you use the same TLD? So by my understanding Google don’t care if I
    have example.ch/product-1.html and example.de/product-1.html with
    duplicate content, since these are two different (country-specific)
    domains, right?

    But it cares if we have
    – example.ch/cat-1/product-1.html (this might be the “main” page)
    – example.ch/product-1.html
    – example.ch/cat-1/filter/latest/product-1.html
    – example.ch/promo/product.html
    and
    they contain the identical content. So the canonical is set on all
    pages except the “main” like , right?

    Thanks for feedback!

    Kind regards
    Andre

  • Pingback: Getting hreflang Right: Examples and Insights for International SEO | Sanford Web Design()

  • Pingback: Hreflang Tag – Part2()

  • Callum

    Hey Grosen,

    You may want to look at whether this statement is correct: “If you on the other hand have identical content in the same language across multiple domains targeted different countries where they speak the same language, then it makes perfect sense to combine hreflang and canonical.”

    In contradiction to your statement above, Google clearly states the following: “We recommend not using rel=canonical across different language or country versions. Using it within the same language/country version is fine and one of the recommended ways of handling canonicalization.” Source: https://sites.google.com/site/webmasterhelpforum/en/faq-internationalisation

    Also, try testing your site (href-lang.com) with the following tool: http://flang.dejanseo.com.au/

    As you can see from the attached screenshot, the above tool returns a “Canonical conflict” error, which suggests this tool abides by Google’s instructions.

    Any thoughts?

    • Callum

      Addendum: Looks like my screenshot did not get included in my original comment. I will try to attach again and, if it does not show, you can try the tool for yourself to see the results.

    • Callum

      The relationship between canonical URLs and href lang are spelled out clearly in this Google video around the 8:43 mark: https://support.google.com/webmasters/answer/189077

      Based on this information, the relationship between canonical URLs and href lang on the site site (href-lang.com) appears to be incorrect.

76 Flares Twitter 13 Facebook 4 Google+ 59 LinkedIn 0 Buffer 0 Email -- StumbleUpon 0 Pin It Share 0 Filament.io 76 Flares ×

Nice job, you found it!

Now, go try out the 12th one:

Use Google Translate to bypass a paywall...

Ran into a page you can't read because it is blocked or paywalled? Here's a quick trick (doesn't always work, but often does!):

Type the page into Google translate (replace the example with the page you want):

http://translate.google.com/translate?sl=ja&tl=en&u=http://example.com/

How about that!?

Like this 12th trick? Tell others they need to look for this trick on our page: http://www.stateofdigital.com/search-hacks-marketers/

Or Tweet: Found the secret 12th one!