Data driven SEO: getting detailed insights by using Free Tools

Data driven SEO: getting detailed insights by using Free Tools

9th March 2015

Small or big SEO projects, all will benefit from a structured approach. With the help of some free tools it is easy to create an overview of the current website performance and simplify certain SEO tasks. First we need to collect website URLs and onpage data, Webmaster tools data, Analytics data and complement that with social and link data. Microsoft Excel will be our main tool to aggregate all the data we need.

Matching datasets on URL basis

all the data sources you will be using are generating reports and downloads in different formats. Always choose one main format for the URLs and stick with it. I always use the version with the domain, so just /blog/ instead of since that saves some space in my datasheet.

Matching datasets can be easily done by using the standard VLOOKUP function within Excel. It matches URLs in different tables and outputs the value required from one of the other columns in the matched tables. John Gagnon published a nice how to including the most made mistakes: Excel VLOOKUP Basics & Top 5 Mistakes Rookies Make

Finding all website URLs

First thing we need to built, is a list of all the URLs of the website. The most obvious way to collect all the URLs is to download your sitemap and copy all the plain URLs. Since we also need to collect some onpage metrics, it is faster to use a website crawler like Screaming Frog, Xenu or Microsoft’s Search Engine Optimization Toolkit. ScreamingFrog also gathers a lot of onpage metrics, is free to use but the free version is limited to 500 URLs. (definitely worth the money by the way). In this blog I’ll be using free tools only so let’s go with Xenu and export to TAB separated file and copy the list with all the found URLs, title, out links & in links to Excel. Xenu doesn’t run on Mac, so buy a real PC and get over it. Xenu is capable of crawling big sites so be careful not to slow down your clients’ website. Always inform your client or its hosting contractor when you are trying to crawl their website with a heavy load by using a crawler like this.


Adding onpage metrics

One of the factors I would like to analyse and compare to other metrics is the current meta data (title & description) and the total amount of text on a page. To gather that data, you can use the SEO for Excel plugin by Niels Bosma. Before you download, figure out if you are using 32 or 64 bit version of Excel and install the Microsoft .Net framework. Niels wrote some clear installation instructions: Installation of the SEO for Excel plugin including all the requirements. After installing, you can easily use the following functions to get the required onpage data:

  • =HtmlMetaDescription(A1)
  • =WordCount(A1)
  • =HtmlH1(A1)


Adding Google Webmaster Tool Data

Next to Google’s Webmaster Tool, you could also use Baidu, Bing or Yandex Webmaster Tools data. For international projects this will shine light on the differences between the performance in the individual search engines. For now, I’ll only focus on the URL performance data from Google. Keep in mind that the numbers mentioned by Google Webmaster Tools are not accurate, so they will be different from the numbers downloaded from analytics dashboards. Also read my article: Why you should not use Google’s WMT data The data is not completely useless since we want to analyse trends and the differences between individual pages.


Once you are logged in to WMT, go to the Search Traffic -> Search Queries dashboard. Go to the “Top Pages” tab and remove the Web only filter. Then you need to select the earliest data possible, that will be around 3 months ago since WMT is not publishing any real historical data. Download this table and match the impression, clicks, CTR and average position to the website URLs overview.

Tip: since Google is going to update their mobile index at the 21st of April, you could also download the data for mobile and index separately by using the filter options:


By doing this, you can easily compared mobile versus desktop performance of specific URLs. This only works if you are using a responsive website on the same domain. If you are using a separate domain, for example, the performance data of the domain will be in a different account within WMT, since you only get information about a specific sub domain. Always verify all your sub domains individually.

Actual performance data via Google Analytics

Most websites have installed Google Analytics, the market share of Google in the analytics segments is currently somewhere between 55-65% so I’ll use that for this example. Of course, this can done by any serious analytics software package you can get nowadays. As primary dimension we will use “Landing Page”; depending on your analysis we want to download the following data:

  • Segment data for organic (if the website is international, I suggest you segment per unique search engine)
  • Segment data for mobile and desktop traffic separately to analyse difference between Google’s indices
  • Download sessions, % new sessions, bounce rate, session duration, conversion or e-commerce data to start with.


You can make use of the Excel or CSV file downloads but if you want to work from within Excel, have a look at the Analytics Edge Excel Add-in: Simple Excel Report Automation Next to the connector for Google Analytics, they also have a connection with AdWords, Bing WMT and even Moz. This will make merging data a bit more simple.

Social data

Next to all the onpage data which are directly relevant to the overall SEO performance, I also use these data dashboards to decide to remove specific pages to clean the website and increase the overall quality of the website. Of course I don’t want to remove pages that got social mentions. Within Excel you can use the following functions from the SEO for Excel plugin:

  • =GooglePlusCount(A1)
  • =TwitterCount(A1)
  • =FacebookLikes(A1)

The results are not always good, so a better solution is using Social Crawlytics by Yousaf Sekander. This tool crawls your website and collects social metrics from seven social networks. Next to G+, Tweets and FB interactions, it also shows Stumbles, Pins, LinkedIn interaction and Delicious bookmarks. If you have a big blog or e-commerce website, contact or donate to Yousaf to get some extra credits. Not only for SEO, but overall social performance insights are interesting to see if your content is being appreciated by your community or clients.


Getting insight into which pages are getting social interactions is important to analyse content quality. And some time in the future Google wants to incorporate social signals into their algorithms but they will first need access to all the main social networks to gather enough data to use it in the search result pages.

Link data

Most link database tools require paid accounts to download the data we need but with Majestic, you can verify your website via Webmaster Tools, verify your WMT account within Majestic and you’ll be able to download link data for your verified websites without any additional costs. You can create any Standard, Advanced or Tracking report for your site for free! We need the number of links and domains on a page by page basis, so fill in your domain go to the Pages Tab and download the data.


Again, SEO for Excel can speed up this process, since they have a Majestic API integration so you can directly download all the metrics you need in the table you already created in the previous steps. Check out the video: Using the Majestic API within SEO for Excel

 The result of adding all these datasets together should be something like the below screenshot:


Using your collected data for SEO analyses

In general, this data dashboard is really useful to find quick wins but also usefull when dealing with a content related filter like the Panda algorithm.

Find quick wins

  1. Create a list of page URLs that already established visibility (but don’t have the top positions yet) in Google but lack internal links. Add more internal links to those pages.
  2. Built a list of pages with a good amount of organic traffic but with a high bounce rate. Ask yourself the question: is the content relevant to the used queries? Enrich, update or remove content accordingly.
  3. Find pages that are not ranking well in Google, but getting valuable, good converting traffic via other sources. Content matches the brand or website obviously so try to get to rank those pages in the organic search results.
  4. Look for pages that are visible, have great onpage metrics but no top 3 positions: get more external or internal links pointing to that page.
  5. Pages with low amount of internal links, but a lot of external links and having top 10 positions. Add internal links, additional content and move towards to the top 3 positions!
  6. Create an overview of pages that are ranking well, but not having a great click through rate. Optimize your meta data (title + meta description elements) to increase the CTR. If it really doesn’t improve your results, ask yourself if your websites is expected to be ranked in those positions for the specific queries. Great rankings doesn’t automatically mean users are expecting or clicking your website.
  7. Create a list of URLs that are visible in search results and getting impressions (WMT data) but are not having any real content targeting a specific query space. Optimize the on page content and your site will benefit from it in most cases.

Deep dive into the data

  1. Similar to high bounce rates, look for pages with good traffic numbers and bad conversion rates. Try testing a new layout, different content or different pricing for example.
  2. Next to conversion rates, have a look at the average value per user. Prioritising can be done according to the best performance page in terms of revenue or the average number of conversions per unique user.
  3. Fighting the Panda: does your website has a lot of pages that don’t receive any organic traffic at all? During the past years, many websites started a blog, pushing live content weekly or even daily without doing any research about the target group, query spaces to target or setting goals per piece of content published. This resulted in a great amount of domains getting filtered out of the organic rankings because of the Panda algorithm within Google. If the majority of your pages is not receiving any traffic at all, start removing old, irrelevant pages.
  4. Try clustering your pages in themes. For e-commerce platform you can sum up metrics for specific product categories for example. Are there any specific folders or clusters of pages that are relevant to eachother not receiving any traffic? This could be a sign of a link penalty or Google not finding your website relevant enough for that specific query space.
  5. Pages that perform well (good conversion rate) in AdWords but not ranking yet in the organic results? Push them harder.

For the more advanced SEO / Excel geniuses: you’ll be able to write some custom made filters and macros to automatically filter out all the pages that fit the questions above. Hopefully this makes your life as a SEO consultant more easy and productive. One sheet, with all the data you need to answer a broad set of questions. Any recommendations of insights you get out of the data presented?


Written By
Jan-Willem Bobbink got addicted with online marketing in 2004, since he build his first international webshop when he was 16. He is currently working as Freelance SEO for global clients and is ambassador for Majestic. His blog can be found at and he shares his cycling adventures at CATW!
Recap Friends of Search 2016  #FOS16
Latest Post from Jan-Willem
Analytics Recap Friends of Search 2016 #FOS16
16th February 2016
  • This field is for validation purposes and should be left unchanged.