Clicky

X

Subscribe to our newsletter

Get the State of Digital Newsletter
Join an elite group of marketers receiving the best content in their mailbox
* = required field
Daily Updates

One step ahead in link-building: Make sure you’ll not get cheated

26 October 2010 BY

0 Flares Twitter 0 Facebook 0 Google+ 0 LinkedIn 0 Buffer 0 Email 0 StumbleUpon 0 Pin It Share 0 Filament.io 0 Flares ×

From what I heard, online marketers and especially SEOs might sometimes be interested in getting links pointing to a self-owned (or a client’s) website ;) OK, fun aside. When building links one of the things that really annoys me, are people trying to cheat you – and I’m not talking a PR 4 vs. a PR 6 link but more stuff like removing a link after a striking a deal (which is more or less the easiest to detect), cloaking content based on user-agents/IPs or using methods like X-Robots headers or the robots Meta tag to prevent indexing pages.

This article is going to be a small check-list on things you’ll have to watch out for – or if possible – your link-building backend should have “an eye on” (and really make sure you run these tasks on a regular basis, say at least once a week):

To have some URL-patterns to show I’m going to use www.domain.com (as the root-domain) and www.domain.com/path/content.html (where the link is placed at) in the following examples. And just to be sure: I take it as basic knowledge that the link you dealt to be placed on that specific page is implemented as a hyperlink using the good old <a>-tag!

1. robots.txt
First of all I’d recommend making sure that the URL is not disallowed using robots.txt (which would be located at www.domain.com/robots.txt). In our case we’d have to watch out for multiple patterns including:

  • Disallow: /path
  • Disallow: /*.html$
  • Disallow: /*content
  • Disallow: /path/content.html

You get the idea – and also keep in mind to check ALL relevant user-agents because sometimes it happens that these pages will only be disallowed for a specific search engine, let’s say Google.

2. HTTP status code & X-Robots header
The second thing to do is sending a simple HTTP HEAD request to the URL where the link is placed at and looking at the response you should especially pay attention to:

  • The HTTP response code: If the HTTP response code is NOT a 200 (=OK) or a cached one (e.g. 304) there might be trouble.
  • The X-Robots header: Generally not to detect when just browsing the website but it’s pretty easy to do it in an automated way. Just grab the “X-Robots” header (if not present, you can skip right away, everything is fine for now!) and check for the value – if it does contain a “noindex” or “nofollow” string (or even both) you need to contact the link owner because for now the link is worthless (a/ the page is not being indexed or b/ the link is not being followed or c/ both).

2. Meta robots & canonical tag
The next things to watch out for are the Meta robots tag as well as the canonical tag. Both can be set in a way that your link is completely worthless:

  • Meta robots tag: It’s pretty much the same as for the X-Robots header; you need to watch out for a “noindex” or “nofollow” value. By the way, another pretty rare value is “none” – this is basically a shortcut for “noindex,nofollow” – make sure you also check for this one!
  • Canonical tag: Let’s continue having a look at the canonical tag – because if the value is different from www.domain.com/path/content.html the link doesn’t help – because as we all know a canonical is treated similar to 301-redirect and in this case search engines would pass all link juice from this URL to the one where the canonical tag point’s to.

3. The rel-attribute
Let’s move on to the link-level, shall we? As mentioned earlier the obvious thing to do is to check for the link itself – this could be done by using a regular expression – this would also allow you to validate the href-attribute and of course the anchor text value.
Additionally you’d have access to other attributes being used – like for example a rel=(*) to detected “nofollow”ed links (if you do care about).

4. Cloaking
Another thing that seems to be getting quite popular at the moment is to cloak-out links for search engines. Basically what some webmasters are trying to do is to deliver the content including the hyperlink to the users but to remove the link when a crawler accesses the website (to reduce the number of outbound links, I guess). The most common tactic is still doing it on a per user-agent basis which is pretty easy to detect – but if it’s getting more advanced like IP based stuff, etc. you’d probably also have to validate the Google cached version of that specific page to really ensure the link is present.

5. More stuff to watch out for
– Is that page being indexed at all? After a while (especially when it’s a new page) you should definitely do a search on Google for that specific URL to make sure the page is being indexed at all – if not, is that page being linked internally to get any link juice from the domain at all?
– It could also be very interesting to monitor the number of outbound links on www.domain.com/path/content.html – because if the number rises (big-time) it might be bad for you (probably you’ll get less link power, maybe the page from that point onwards is turned into an “obvious link-selling page”?)
– And last but not least it might be interesting to monitor if there are some of the common “ppc”-keywords / bad-words on that page. Maybe the page got hacked and you’re now in a very bad neighborhood? I’d love to know ;)

I hope this helps a little bit when dealing links or building a tool to make sure those stay in place. If you don’t have a tool to do all the tasks at once, here are some Firefox add-ons to do some of that work for you:

  1. robots.txt: SearchStatus has a “show robots.txt” shortcut, viewing it is just two clicks away. Go get it here.
  2. HTTP status code: LiveHTTP Headers could to the trick.
  3. X-Robots & Meta robots: SeeRobots does a great job in visualizing both values without having to look at headers or the source – grab it here.
  4. The canonical tag can easily be seen directly in Firefox (the blue icon near to the address bar), no add-on needed.
  5. Rel-attribute: SearchStatus does “nofollow” highlighting (and much more) – see the link above.
  6. To check if a link is using the appropriate tag it’s probably the easiest to verify using Firebug (here) and the “inspect HTML” function.
  7. Cloaking: To check for user-agent based delivery you could use the web developer toolbar in combination with the user-agent switcher.

Other add-on recommendations or more things you’d check for? Looking forward to your feedback in the comment section!

AUTHORED BY:
h

Bastian Grimm is founder and CEO of Grimm Digital. He mainly works as online marketing consultant with a strong focus on organic search engine optimization (SEO). Grimm specializes in SEO strategy consulting, website assessments as well as large scale link building campaigns.
  • Pingback: Tweets that mention One step ahead in link-building: Make sure you’ll not get cheated - State of Search -- Topsy.com

  • http://www.activetraffic.de activetraffic

    nice one Basti! ;-)
    not to forget to check the html-Code as itself, as there might be hidden-Content, hidden-Links you can´t see in a regular view in your Browser! Especially malware snippets are often placed within the code this way!

    greets
    Nico

  • http://www.grimm-digital.com/ Bastian Grimm

    I do agree that it might make sense to figure out if there are hidden elements within that site which do contain a massive amount of outbound links – however that would also be detected (as I wrote on 5. / 2.) when monitoring the number of links in general.

    Looking at this from a malware perspective it might be a nice idea to query Google on a per-URL basis and check if they return a malware warning for that one – that’s gonna be way easier vs. monitoring the linking websites code (because you’d also need a JS interpreter to see if sth. bad in injected).

    Thanks for your comment! :)

  • http://www.marcelboast.com/badboteliminator/ Marcel

    One of the important things to look out for is the nofollow tag, dirty trick if you’re doing link exchanges!

  • http://www.grimm-digital.com/ Bastian Grimm

    I do agree – thats what I meant when I wrote that you want to check the rel-attribute (see 3.) and check the value :)

    Thanks for you comment!

  • http://www.station-marketing.com Cours seo montreal

    Although most webmaster believe no-follow links are useless, there has been several research around, and if I remember right, some folks from SeoBook have done a test by building a few thousands of sites with a common typo word that was not present in the target page and neither relevant.

    It seems that no-follow links still pass some juice or at least help ranks with some keywords. I personally judge the whole concept of PR obsolete anyways, has I have encountered several domains that were fairly aged and some with literally hundreds of thousands of NATURAL backlinks amongst them enormous quantities of PR5= links to have pretty low PR.

  • http://magstags.com Mags

    It’s a great article, Bastian! I was going to write about it actually on my blog:)
    I have done a lot of link building and met all those mentioned issues. Some people do not know how trickly link building really is! I actually received recently a link buidling report from an agency and when went through the list just 5% of links actually had an SEO value! (no comments on that please:)

    Thank you for sharing. It’s great info!

  • Pingback: 8 Development Mistakes to avoid when migrating a website | WebWeter.nl

0 Flares Twitter 0 Facebook 0 Google+ 0 LinkedIn 0 Buffer 0 Email 0 StumbleUpon 0 Pin It Share 0 Filament.io 0 Flares ×

Nice job, you found it!

Now, go try out the 12th one:

Use Google Translate to bypass a paywall...

Ran into a page you can't read because it is blocked or paywalled? Here's a quick trick (doesn't always work, but often does!):

Type the page into Google translate (replace the example with the page you want):

http://translate.google.com/translate?sl=ja&tl=en&u=http://example.com/

How about that!?

Like this 12th trick? Tell others they need to look for this trick on our page: http://www.stateofdigital.com/search-hacks-marketers/

Or Tweet: Found the secret 12th one!