URL Canonicalisation: How to Diagnose & Rectify Errors
A couple of months back, I provided some advice for an article on TheNextWeb entitled ‘How to Improve Your Site’s SEO’, which focused on providing tips for small businesses. One of the tips I provided here was for individuals to check and ensure that their domain resolves at the correct URL, something that us SEOs refer to as canonicalisation.
In this article I also stated that should anyone be having issues with canonicalisation errors then they could get in contact with me. Many did, and therefore this post will:
- Explain what canonicalisation is
- Identify the (adverse) implications for search
- How to uncover these problems
- And if you are having these problems, how you can fix them
Let’s get started…
“So what is canonicalisation then?” I hear you ask. Canonicalisation occurs when the same/or very similar content can be accessed on multiple URLS. It is often caused by default settings in web servers, however this means that a seemingly innocent list of URLs such as those indicated below:
These could all serve different content individually. Therefore,
“When Google “canonicalizes” a url, we try to pick the url that seems like the best representative from that set.” (Matt Cutts, 2006)
However, as with a lot of things in SEO, in order to do this most effectively we’ve got to help the search engines out a little bit.
Canonicalisation: Implications for Search
The implications for canonicalisation are two-fold, providing a poor user experience and duplicating content on your site, while you could also make the argument that not solving canonicalisation issues could lead to poor brand visibility and even missing out of naturally acquired links.
Poor User Experience and Duplicate Content
It is important that each unique page has a single URL. Search engines view different versions of a URL as distinct individual pages, the problem here is that it can index them individually, or worse show the pages that you do not wish to be ranking. Therefore, if different URLs (as shown above) host the same content then this can be considered as duplicate content by search engines.
Additionally, this can provide a poor user experience, to show an example of this try out the little experiment below:
- Open up your web browser and type “www.manchestercentral.co.uk” into the URL bar, then try doing this removing the ‘www.’ (“Manchestercentral.co.uk”), witness what happens:
- Now try this again with a website like wish.co.uk, typing in either “www.wish.co.uk” or “wish.co.uk” leads you to the same page:
Poor Brand Visibility/Unclaimed Link Equity
It’s natural to assume that if other websites are therefore going to mention yours, it’s likely that they may link to the non-canonical versions (example.com) resulting in a dilution of link authority among the different versions of the page.
It took me just 5 minutes to discover some examples that needed a bit of TLC when looking at Manchestercentral.co.uk (mentioned earlier) and some very well known brands, such as GBK (Gourmet Burger Kitchen), and, everybody’s favourite, Nandos. Just look at the links that they are effectively missing out on…
As Matt puts it:
“Don’t make half of your links go to http://example.com/ and the other half go to http://www.example.com/“ (Matt Cutts, 2006)
Canonicalisation: How to Discover the problems
The good news discovering whether you have canonicalisation issues or duplicate content (as a result of canonicalisation issues) really isn’t too difficult. Simply arm yourself the ‘toolkit’ I’ve identified and carry out the steps below:
- Browser open on Google
- Ayima Google Chrome Redirect Path add-on installed (incidentally one of my favourite Chrome plugins)
- An eye-for-detail
- Some common sense
Take a particularly important page on your site, select and copy a sentence that you believe is unique to that page (e.g. not something like a generic brand description), search for this in Google surrounded by quote marks:
I took the sentence below from this page and did this:
The result should show just the one page in question, (it does in the example above which is good for GBK). If this is not the case and you can see more than one result then you may potentially have duplicate content issues. Another tell tale sign is seeing a warning like this at the bottom of the SERPs:
Make sure that you note down the URLs of those pages being displayed. Are these the correct ones that you want to be displayed? If not then it’s likely you’ve got duplicate content caused (or at least affected by) canonicalisation, below I’ve outlined several ways of how you can best rectify these.
Canonicalisation: How to Solve the Problems
There are 4 main ways that you can ensure that you handle canonicalisation issues:
- 301 Redirects
- NoIndex, Follow
- Parameter Handling
I have outlined these below, in order of preference/SEO best practice:
Generally considered as the best cure for canonicalisation issues. Implementing a 301 redirect should be considered the most concrete way of dealing with canonicalisation issues. The ideal implementation being that your webserver does a 301 (permanent) redirect if someone requests http://example.com/ to http://www.example.com/.
It is also important to make a decision here whether you want to do it this way, or the other way around. In the example of Wish.co.uk I mentioned above, they’ve implemented this the other way round: The webserver does a 301 (permanent) redirect if someone requests http://www.example.com/ to http://example.com/. Both are perfectly valid solutions, but you must ensure that you pick and stick with just the one, to save confusion moving forward.
How to check it’s working
1. Once you have implemented this (or got your IT guy to implement this) then you can check it’s working very simply, take 4 pages from your website:
- Top-level page
- Product page
- Low-level page (may be a blog post, or an archived page)
2. Enter them all into your browser without including the ‘www.’.
3. Then see what the Ayima Google Chrome Redirect Path add-on is showing. It should look something like this:
Please note: Ensure that this is not a 302 Redirect, as a 302 redirect is only classed as a ‘temporary redirect’, while it will redirect the user, it will not redirect any link equity, as its 301 counterpart would.
Rel=Canonical is a way of identifying the ‘canonical page’ e.g. the preferred page out of a set, that you wish search engines to use. As Google puts it
“A canonical page is the preferred version of a set of pages with highly similar content.”
There are 2 ways of implementing rel=”canonical” on your site:
1. First, you can add rel=”canonical” link to the <head> section of all of the non-canonical versions of the page. This means that in the example below:
2. A rarer example of this implementation is if the offending pages content is not in HTML (such as a .PDF file) you can indicate the canonical version of each URL by using the link rel=”canonical” HTTP header:
Link: <http://www.example.com/downloads/example-white-paper.pdf>; rel=”canonical”
You can understand more about the best practice implementation of rel=”canonical” using Google’s guidelines.
Pro-Tip: Ensure that you read this article on 5 common mistakes with rel=”canonical”, it’s surprising how easy it is to wrongly implement!
Although not ideal from an SEO best practice point of view, if you are still experiencing problems you can use the META directive NoIndex, Follow. This tag essentially says to Google: “Please crawl this page and every page that is linked to here, but do not index this page”.
You can implement this by adding the following line to the <head> section on the HTML page in question.
<META NAME=”ROBOTS” CONTENT=”NOINDEX, FOLLOW”>
Effectively this will direct search engine crawlers to do the following:
Find out more on the NoIndex, Follow META tag here.
Probably considered the lowest in terms of best practice implementation, however if you have discovered that you have pagination or duplicate content issues Google Webmaster Tools offers you the ability to identify how it should handle different parameters that it will come across on your site. Google is pretty good at handling parameters (there is always room for improvement) but this option allows you to see what parameters Google has discovered and select one of the 4 following options:
- Let Googlebot decide
- Every URL
- Only crawl URLs with value=x
- No URLs
This could warrant a blog post itself, but the best guide that I have seen on how to identify how Google should handle parameters is the one provided by Google themselves. This post goes into much more detail, but this handy table below should also help guide you setting up parameter handling:
That’s All Folks!
Congratulations! If you’ve made it to here then you should now be better equipped in:
- Understanding what that ‘canonicalisation’ word is all about
- Recognising the potential SEO and usability downfalls of not handling these issues correcly
- How to identify whether your site has canonicalisation issues
- 4 ways of how you can rectify these problems
I’m more than happy to open the debate for any questions, or observations that anyone has had while approaching trying to fix canonicalisation issues. If you have anything, then please drop a line in the comments below.