Maybe you remember my post on things to look at and make sure how not to get cheated in link-building? Well, that list needs to be extended by a very, very important item: The “newly” introduced Canonical Tag using a HTTP header directive.
But first of all, let’s have a look on what happened: Last Friday there was a post on the official Google Webmaster blog stating the following: “Based on your feedback, we’re happy to announce that Google web search now supports link rel=”canonical” relationships specified in HTTP headers […]”. Ok, so what does it mean? Up to now you had to place the canonical tag within a website’s HTML source code. This might have looked similar to this one:
<link rel="canonical" href="http://www.domain.com/keyword.html" />
Shortly after introducing the tag in 2009 there was some feedback (including this one) saying that it would have been a better idea to also allow HTTP headers as a way of implementing the directive – mainly because just being able to use a HTML tag does limit the usage quite a bit . So, as of now you can also use the tag to canonicalize other documents, say PDFs. This does make a lot of sense given the case that you’d maybe provide a downloadable PDF version of your content but also display the information directly on your site. Looking “behind the scenes” the HTTP header would look like this one:
Link: <http://www.domain.com/keyword.html>; rel="canonical"
I will spare you with more details on how to do it – this can all be found over at the official blog post. Maybe just one quick thing: Note that Google explicitly stated that this header just works in web search, not in other verticals!
But now let’s get back to my first statement. If you “do in links” – whatever that might mean – you’re probably also into controlling and accepting link implementations, right? Or even better: You have your software doing that for you? Just great! But no matter how you do it, starting from today, you need to consider that people might get even more creative in trying to cheat you. And having the possibility to place another nearly “invisible” HTTP header doesn’t really make things easier…
That being said let’s have a quick look on what can happen: Say you’re getting a link from www.domain.com/site1.html – this site is nicely linked internally, maybe does even have some external link value, and fits the topic (and all the other criteria you do look at when building links) – simply a great site to get a link from. You’d look at the robots meta tag and also check on the HTML canonical tag – all look good. But if someone would place a HTTP header – like the above mentioned – and canonicalize www.domain.com/site1.html to www.domain.com/site2.html – well, that wouldn’t be nice, would it? And doing a quick HTML check-up, you wouldn’t even notice…
So really consider double checking link-source pages in regards to HTTP headers as well. Not only the canonical tag can cause you trouble but also the X-Robots directive (e.g. a combination of “noindex” and / or “nofollow”) might completely devaluate the link you just acquired. If you’re on a software based solution I’d recommend to have that check implemented right away (we just did) – and if you’re relying on manual check-up’s I’d suggest you grab HttpFox for Firefox (or similar) to have a quick look into a website’s HTTP headers.