I’ve always believed that link building without knowing what your current link profile looks like is a bit like flying blind. Especially with the recent goings on, as an SEO you really need to have a good understanding of who’s linking to you, and in what context those sites are linking. Ideally, you should have a good handle on anchor text distribution, page rank distribution and also a solid idea of what your competitions link profiles look like – i.e do you fit in or does your site stand out like a sore thumb?
As a minimum step when starting a link building project, I’ll make sure I take a good look at the client’s current link profile, focusing in on what the anchor text looks like; and to try and ascertain what sort of link building has been conducted in the past. That helps me to make some decisions about what anchor text to target, and what I might be up against. Key to this is being able to quickly gather the data you need and organise it in a way that tells the right story.
Some SEO’s prefer Majestic, some prefer Open Site Explorer; personally I prefer using both. I’ve found over time that the best way to gather a full picture of a sites link profile is to combine the exports from both tools into one big data set. Remember when doing this that you need to filter out internal links from both tools – at this stage I’m only really interested in external links. You’ll also want to make sure that you’re looking at data to the root domain or sub domain, and not just one specific URL. Once you’ve got both data sets, you’re ready to start playing around and making sense of it all. It’s worth noting that when combining both data sets, I’m only keeping the source URL, destination URL and anchor text – I’ll start pulling in link metrics once I’ve got the initial stuff organised.
The first bit of playing around involves pulling out the domain from the source URL. To do this, I’ll add a new column and paste in the following:
My reasoning here is that I want to see the page rank for the domain and not the individual URL. I know that page rank is definitely not the best measure of quality, but for a quick check I still like it. Once you’ve separated all the domains out, copy them into a new tab called ‘domain PR’ ready for collecting the page rank. To collect the page rank data I use SEO Tools for Excel; if you haven’t got it installed go and do it right now!
If you’re working with a large number of links this is going to take some time. To make your life that little bit quicker and less painful make sure you de-dupe the list of domains first. Running multiple calls for one domain is just going to waste your time and make you go insane:
Make yourself a tea (or beer), enter the =googlepagerank([domain]) into your page rank column, sit back and you’re done! Once you’ve finished, make sure you paste in your results as values – otherwise you’ll be rerunning the crawl and pulling your hair out pretty quickly!
The next step is to pull this data through into your main data tab. To do this we need to make use of VLOOKUP. By now you should have a source URL column, Domain column, Destination column and Anchor Text column. Create a ‘Domain PR’ column and paste in the following VLOOKUP function:
B2 = linking domain
Sheet1!$A$1:$B$28 = my domain and page rank data range
2 = page rank data column within that range
0 = find exact match
Getting your head around VLOOKUP can be a bit tough at first; but once you have it’s something you’re likely to use all the time – so well worth the effort! Once we’ve done that we should end up with a table looking a little like:
It’s pretty obvious, but you probably don’t want to be wasting time looking through links that are no longer live. To save myself a bunch of time I use Screaming Frog to run a quick check to see which links are still in place. I wrote about this in a bit more detail here, but what we’re doing is using the custom filters to search the source code of each URL for mentions of your sites domain. That should give you a pretty good idea of which links are still live.
Using the export function in the custom tab of Screaming Frog, you’ll end up with a list of URL’s whereby your specified domain has been discovered. This list of URL’s can then be pasted into a new tab called ‘live list’:
To get this into our main data sheet we first need to add a column called ‘Live?’ What we really want is a simple ‘true’ or ‘false’ answer in that column to allow us to filter through by links that are still live, and filter out any links that no longer exist. To achieve that we’re going to make use of the following function:
Basically, we’re using the function above to check whether or not the ‘source URL’ in our main data tab can be found in the ‘live URLs’ list; and then to return a TRUE if it can be found, and a FALSE if not:
A2 = source URL
‘live urls’!$A$1:$A$28 = our list of live URLs
0 = look for an exact match
FALSE = value if URL is not found
TRUE = value if URL can be found
So far we should now be able to sort through our link data by domain page rank, and also filter out any links that are no longer live. Even doing this should enable you to get a good picture of what’s going on with your link profile, and also help you to filter through this data in a way that’s pretty useful.
If you’re looking to use this information to report back to a client on the condition of their link profile, one very easy thing to do is create some pretty charts to help visualise the page distribution. By creating a simple chart from your domain page rank data (de-duped) it’s extremely easy to see if you might have any issues caused by a large number of links from low quality sites:
What we’re looking for here is proportion. Typically, a good link profile will see a peak around the PR3 mark; but in this case you can clearly see that the site in question has a large volume of links coming from PR0 domains. In terms of decision making, this leads me to a point where I might be planning to remove some of the PR0 links and aim to build a bunch of higher quality links to re-address the balance. To back this up, I would ideally want to create the same chart for a number of competitors so I can easily my page rank distribution side by side with a bunch of other sites in the same space. In terms of explaining your points to clients, that sort of easy comparison is extremely useful.
There’s an absolute ton more stuff you can do with this, so as this post was starting to turn into a bit of a beast I decided it might be best to break it into two parts. Check back for part two next month!
image credit – images of money