Clicky

X

Subscribe to our newsletter

Get the State of Digital Newsletter
Join an elite group of marketers receiving the best content in their mailbox
* = required field
Daily Updates

URL Canonicalisation: How to Diagnose & Rectify Errors

22 October 2013 BY

198 Flares Twitter 118 Facebook 25 Google+ 22 LinkedIn 9 Buffer 24 Email -- StumbleUpon 0 Pin It Share 0 Filament.io 198 Flares ×

A couple of months back, I provided some advice for an article on TheNextWeb entitled ‘How to Improve Your Site’s SEO’, which focused on providing tips for small businesses. One of the tips I provided here was for individuals to check and ensure that their domain resolves at the correct URL, something that us SEOs refer to as canonicalisation.

In this article I also stated that should anyone be having issues with canonicalisation errors then they could get in contact with me. Many did, and therefore this post will:

  1. Explain what canonicalisation is
  2. Identify the (adverse) implications for search
  3. How to uncover these problems
  4. And if you are having these problems, how you can fix them

Let’s get started…

Canonicalisation: Definition

“So what is canonicalisation then?” I hear you ask. Canonicalisation occurs when the same/or very similar content can be accessed on multiple URLS. It is often caused by default settings in web servers, however this means that a seemingly innocent list of URLs such as those indicated below:

Canonicalisation Issues

  • www.example.com
  • example.com/
  • www.example.com/index.html
  • example.com/home.asp

These could all serve different content individually. Therefore,

“When Google “canonicalizes” a url, we try to pick the url that seems like the best representative from that set.” (Matt Cutts, 2006)

However, as with a lot of things in SEO, in order to do this most effectively we’ve got to help the search engines out a little bit.

Canonicalisation: Implications for Search

The implications for canonicalisation are two-fold, providing a poor user experience and duplicating content on your site, while you could also make the argument that not solving canonicalisation issues could lead to poor brand visibility and even missing out of naturally acquired links.

Poor User Experience and Duplicate Content

It is important that each unique page has a single URL. Search engines view different versions of a URL as distinct individual pages, the problem here is that it can index them individually, or worse show the pages that you do not wish to be ranking. Therefore, if different URLs (as shown above) host the same content then this can be considered as duplicate content by search engines.

Additionally, this can provide a poor user experience, to show an example of this try out the little experiment below:

  • Open up your web browser and type “www.manchestercentral.co.uk” into the URL bar, then try doing this removing the ‘www.’ (“Manchestercentral.co.uk”), witness what happens:

Manchester Central Canonicalisation Issues

 

  • Now try this again with a website like wish.co.uk, typing in either “www.wish.co.uk” or “wish.co.uk” leads you to the same page:

wish.co.uk

 

Poor Brand Visibility/Unclaimed Link Equity

It’s natural to assume that if other websites are therefore going to mention yours, it’s likely that they may link to the non-canonical versions (example.com) resulting in a dilution of link authority among the different versions of the page.

It took me just 5 minutes to discover some examples that needed a bit of TLC when looking at Manchestercentral.co.uk (mentioned earlier) and some very well known brands, such as GBK (Gourmet Burger Kitchen), and, everybody’s favourite, Nandos. Just look at the links that they are effectively missing out on…

missed-links-due-to-canonicalisation-issues

As Matt puts it:

“Don’t make half of your links go to http://example.com/ and the other half go to http://www.example.com/ (Matt Cutts, 2006)

Canonicalisation: How to Discover the problems

The good news discovering whether you have canonicalisation issues or duplicate content (as a result of canonicalisation issues) really isn’t too difficult. Simply arm yourself the ‘toolkit’ I’ve identified and carry out the steps below:

Canonicalisation Toolkit:

Step 1:

Take a particularly important page on your site, select and copy a sentence that you believe is unique to that page (e.g. not something like a generic brand description), search for this in Google surrounded by quote marks:

I took the sentence below from this page and did this:

GBK Duplicate Content Check

Step 2:

The result should show just the one page in question, (it does in the example above which is good for GBK). If this is not the case and you can see more than one result then you may potentially have duplicate content issues. Another tell tale sign is seeing a warning like this at the bottom of the SERPs:

Duplicate Content Filtering by Google

Make sure that you note down the URLs of those pages being displayed. Are these the correct ones that you want to be displayed? If not then it’s likely you’ve got duplicate content caused (or at least affected by) canonicalisation, below I’ve outlined several ways of how you can best rectify these.

Canonicalisation: How to Solve the Problems

There are 4 main ways that you can ensure that you handle canonicalisation issues:

  • 301 Redirects
  • Rel=Canonical
  • NoIndex, Follow
  • Parameter Handling

I have outlined these below, in order of preference/SEO best practice:

301 Redirect

Generally considered as the best cure for canonicalisation issues. Implementing a 301 redirect should be considered the most concrete way of dealing with canonicalisation issues. The ideal implementation being that your webserver does a 301 (permanent) redirect if someone requests http://example.com/ to http://www.example.com/.

It is also important to make a decision here whether you want to do it this way, or the other way around. In the example of Wish.co.uk I mentioned above, they’ve implemented this the other way round: The webserver does a 301 (permanent) redirect if someone requests http://www.example.com/ to http://example.com/. Both are perfectly valid solutions, but you must ensure that you pick and stick with just the one, to save confusion moving forward.

How to check it’s working

1. Once you have implemented this (or got your IT guy to implement this) then you can check it’s working very simply, take 4 pages from your website:

  • Homepage
  • Top-level page
  • Product page
  • Low-level page (may be a blog post, or an archived page)

2. Enter them all into your browser without including the ‘www.’.

3. Then see what the Ayima Google Chrome Redirect Path add-on is showing. It should look something like this:

Ayima Redirect Path Chrome

Please note: Ensure that this is not a 302 Redirect, as a 302 redirect is only classed as a ‘temporary redirect’, while it will redirect the user, it will not redirect any link equity, as its 301 counterpart would.

Rel=Canonical

Rel=Canonical is a way of identifying the ‘canonical page’ e.g. the preferred page out of a set, that you wish search engines to use. As Google puts it

“A canonical page is the preferred version of a set of pages with highly similar content.”

There are 2 ways of implementing rel=”canonical” on your site:

1. First, you can add rel=”canonical” link to the <head> section of all of the non-canonical versions of the page. This means that in the example below:

Rel="Canonical" Implementation Example

2. A rarer example of this implementation is if the offending pages content is not in HTML (such as a .PDF file) you can indicate the canonical version of each URL by using the link rel=”canonical” HTTP header:

Link: <http://www.example.com/downloads/example-white-paper.pdf>; rel=”canonical”

You can understand more about the best practice implementation of rel=”canonical” using Google’s guidelines.

Pro-Tip: Ensure that you read this article on 5 common mistakes with rel=”canonical”, it’s surprising how easy it is to wrongly implement!

NoIndex, Follow

Although not ideal from an SEO best practice point of view, if you are still experiencing problems you can use the META directive NoIndex, Follow. This tag essentially says to Google: “Please crawl this page and every page that is linked to here, but do not index this page”.

You can implement this by adding the following line to the <head> section on the HTML page in question.

<META NAME=”ROBOTS” CONTENT=”NOINDEX, FOLLOW”> 

Effectively this will direct search engine crawlers to do the following:

NoIndex,Follow META Tag Example Find out more on the NoIndex, Follow META tag here.

Parameter Handling

Probably considered the lowest in terms of best practice implementation, however if you have discovered that  you have pagination or duplicate content issues Google Webmaster Tools offers you the ability to identify how it should handle different parameters that it will come across on your site. Google is pretty good at handling parameters (there is always room for improvement) but this option allows you to see what parameters Google has discovered and select one of the 4 following options:

  • Let Googlebot decide
  • Every URL
  • Only crawl URLs with value=x
  • No URLs

This could warrant a blog post itself, but the best guide that I have seen on how to identify how Google should handle parameters is the one provided by Google themselves. This post goes into much more detail, but this handy table below should also help guide you setting up parameter handling:

stod-table-url-canonicalisation

That’s All Folks!

Congratulations! If you’ve made it to here then you should now be better equipped in:

  • Understanding what that ‘canonicalisation’ word is all about
  • Recognising the potential SEO and usability downfalls of not handling these issues correcly
  • How to identify whether your site has canonicalisation issues
  • 4 ways of how you can rectify these problems

I’m more than happy to open the debate for any questions, or observations that anyone has had while approaching trying to fix canonicalisation issues. If you have anything, then please drop a line in the comments below.

AUTHORED BY:
h

Ned Poulter is the Co-Founder of AvitaDigital, a Digital Marketing Consultancy based in Copenhagen, Denmark. He specialises in all aspects of SEO, digital marketing consulting and PPC, amongst many other things.

8 Responses to “URL Canonicalisation: How to Diagnose & Rectify Errors”

  1. Victor Codero says:

    Hi Ned!

    Nice post!

    Fix the examples on NOINDEX, FOLLOW, current: .

  2. Great stuff Ned. Great that you’ve called-out Ayima Redirect Path – one of the first extensions I add to a new Chrome install. And on that note, a quick extension/add-on addendum related to canonicalization for you and your readers. Namely, extensions that show you in the status bar if a rel=”canonical” is present and, if so, whether or not it matches the URL of the present page.

    For Chrome I use the extension “Canonical Inspector” by Tobias Redmann, and for Firefox “SearchStatus” by Quirk (SearchStatus includes an SEO context menu as well).

    As a bonus both of allows you to click on the “C” when it’s not the canonical URL to be taken to the canonical version – a useful way of stripping out parameters if you want an unadorned URL to share or bookmark.

    I can’t tell you the number of times this simple icon has saved my bacon in helping me identify malformed canonicals (like the time an update resulted in the home page being assigned as the value of rel=”canonical” for the entire site!). Since the indicator is always there when you’re browsing, a pretty much zero-effort and more thorough alternative to code spot-checks for the presence and href content of rel=”canonical”.

    • Ned Poulter says:

      Hi Aaron,

      Thanks for the response and sharing some of the extensions that you use. I’m familiar with all apart from the ‘Canonical Inspector’, that I’m going to check out now! Funnily enough, I’m VERY familiar with SearchStatus – as I worked for Quirk for some 2 years ;-)

      Great input though – really added some useful pointers (I’ll hopefully get round to adding them in), and very glad that you liked the article.

      Have a great day!

  3. Spook SEO says:

    Hi Ned,

    Once again, this is another very informative article from you. I’ve read one of your articles “How to Improve Your SEO” and I find it so helpful for my SEO. BTW, I use Ayima Google Chrome Redirect Path add-on on these issues and often use META directive NoIndex Follow. It could really fix the problem.

  4. Anthony Lavall says:

    Great find with Nandos and GBK – just shows how SEO is still in its infancy with so many big companies…

Leave a Reply

198 Flares Twitter 118 Facebook 25 Google+ 22 LinkedIn 9 Buffer 24 Email -- StumbleUpon 0 Pin It Share 0 Filament.io 198 Flares ×

Nice job, you found it!

Now, go try out the 12th one:

Use Google Translate to bypass a paywall...

Ran into a page you can't read because it is blocked or paywalled? Here's a quick trick (doesn't always work, but often does!):

Type the page into Google translate (replace the example with the page you want):

http://translate.google.com/translate?sl=ja&tl=en&u=http://example.com/

How about that!?

Like this 12th trick? Tell others they need to look for this trick on our page: http://www.stateofdigital.com/search-hacks-marketers/

Or Tweet: Found the secret 12th one!