Data driven SEO: getting detailed insights by using Free Tools

Small or big SEO projects, all will benefit from a structured approach. With the help of some free tools it is easy to create an overview of the current website performance and simplify certain SEO tasks. First we need to collect website URLs and onpage data, Webmaster tools data, Analytics data and complement that with social and link data. Microsoft Excel will be our main tool to aggregate all the data we need.

Matching datasets on URL basis

all the data sources you will be using are generating reports and downloads in different formats. Always choose one main format for the URLs and stick with it. I always use the version with the domain, so just /blog/ instead of http://www.stateofdigital.com/blog/ since that saves some space in my datasheet.

Matching datasets can be easily done by using the standard VLOOKUP function within Excel. It matches URLs in different tables and outputs the value required from one of the other columns in the matched tables. John Gagnon published a nice how to including the most made mistakes: Excel VLOOKUP Basics & Top 5 Mistakes Rookies Make

Finding all website URLs

First thing we need to built, is a list of all the URLs of the website. The most obvious way to collect all the URLs is to download your sitemap and copy all the plain URLs. Since we also need to collect some onpage metrics, it is faster to use a website crawler like Screaming Frog, Xenu or Microsoft’s Search Engine Optimization Toolkit. ScreamingFrog also gathers a lot of onpage metrics, is free to use but the free version is limited to 500 URLs. (definitely worth the money by the way). In this blog I’ll be using free tools only so let’s go with Xenu and export to TAB separated file and copy the list with all the found URLs, title, out links & in links to Excel. Xenu doesn’t run on Mac, so buy a real PC and get over it. Xenu is capable of crawling big sites so be careful not to slow down your clients’ website. Always inform your client or its hosting contractor when you are trying to crawl their website with a heavy load by using a crawler like this.

xenu-screenshot

Adding onpage metrics

One of the factors I would like to analyse and compare to other metrics is the current meta data (title & description) and the total amount of text on a page. To gather that data, you can use the SEO for Excel plugin by Niels Bosma. Before you download, figure out if you are using 32 or 64 bit version of Excel and install the Microsoft .Net framework. Niels wrote some clear installation instructions: Installation of the SEO for Excel plugin including all the requirements. After installing, you can easily use the following functions to get the required onpage data:

  • =HtmlMetaDescription(A1)
  • =WordCount(A1)
  • =HtmlH1(A1)

seo-for-excel-screenshot

Adding Google Webmaster Tool Data

Next to Google’s Webmaster Tool, you could also use Baidu, Bing or Yandex Webmaster Tools data. For international projects this will shine light on the differences between the performance in the individual search engines. For now, I’ll only focus on the URL performance data from Google. Keep in mind that the numbers mentioned by Google Webmaster Tools are not accurate, so they will be different from the numbers downloaded from analytics dashboards. Also read my article: Why you should not use Google’s WMT data The data is not completely useless since we want to analyse trends and the differences between individual pages.

webmastertools-download-pages

Once you are logged in to WMT, go to the Search Traffic -> Search Queries dashboard. Go to the “Top Pages” tab and remove the Web only filter. Then you need to select the earliest data possible, that will be around 3 months ago since WMT is not publishing any real historical data. Download this table and match the impression, clicks, CTR and average position to the website URLs overview.

Tip: since Google is going to update their mobile index at the 21st of April, you could also download the data for mobile and index separately by using the filter options:

wmt-filters-web-and-mobile

By doing this, you can easily compared mobile versus desktop performance of specific URLs. This only works if you are using a responsive website on the same domain. If you are using a separate domain, m.mydomain.com for example, the performance data of the domain will be in a different account within WMT, since you only get information about a specific sub domain. Always verify all your sub domains individually.

Actual performance data via Google Analytics

Most websites have installed Google Analytics, the market share of Google in the analytics segments is currently somewhere between 55-65% so I’ll use that for this example. Of course, this can done by any serious analytics software package you can get nowadays. As primary dimension we will use “Landing Page”; depending on your analysis we want to download the following data:

  • Segment data for organic (if the website is international, I suggest you segment per unique search engine)
  • Segment data for mobile and desktop traffic separately to analyse difference between Google’s indices
  • Download sessions, % new sessions, bounce rate, session duration, conversion or e-commerce data to start with.

google-analytics-downloads

You can make use of the Excel or CSV file downloads but if you want to work from within Excel, have a look at the Analytics Edge Excel Add-in: Simple Excel Report Automation Next to the connector for Google Analytics, they also have a connection with AdWords, Bing WMT and even Moz. This will make merging data a bit more simple.

Social data

Next to all the onpage data which are directly relevant to the overall SEO performance, I also use these data dashboards to decide to remove specific pages to clean the website and increase the overall quality of the website. Of course I don’t want to remove pages that got social mentions. Within Excel you can use the following functions from the SEO for Excel plugin:

  • =GooglePlusCount(A1)
  • =TwitterCount(A1)
  • =FacebookLikes(A1)

The results are not always good, so a better solution is using Social Crawlytics by Yousaf Sekander. This tool crawls your website and collects social metrics from seven social networks. Next to G+, Tweets and FB interactions, it also shows Stumbles, Pins, LinkedIn interaction and Delicious bookmarks. If you have a big blog or e-commerce website, contact or donate to Yousaf to get some extra credits. Not only for SEO, but overall social performance insights are interesting to see if your content is being appreciated by your community or clients.

state-of-digital-social-crawlytics

Getting insight into which pages are getting social interactions is important to analyse content quality. And some time in the future Google wants to incorporate social signals into their algorithms but they will first need access to all the main social networks to gather enough data to use it in the search result pages.

Link data

Most link database tools require paid accounts to download the data we need but with Majestic, you can verify your website via Webmaster Tools, verify your WMT account within Majestic and you’ll be able to download link data for your verified websites without any additional costs. You can create any Standard, Advanced or Tracking report for your site for free! We need the number of links and domains on a page by page basis, so fill in your domain go to the Pages Tab and download the data.

download-linkdata

Again, SEO for Excel can speed up this process, since they have a Majestic API integration so you can directly download all the metrics you need in the table you already created in the previous steps. Check out the video: Using the Majestic API within SEO for Excel

 The result of adding all these datasets together should be something like the below screenshot:

data-driven-seo

Using your collected data for SEO analyses

In general, this data dashboard is really useful to find quick wins but also usefull when dealing with a content related filter like the Panda algorithm.

Find quick wins

  1. Create a list of page URLs that already established visibility (but don’t have the top positions yet) in Google but lack internal links. Add more internal links to those pages.
  2. Built a list of pages with a good amount of organic traffic but with a high bounce rate. Ask yourself the question: is the content relevant to the used queries? Enrich, update or remove content accordingly.
  3. Find pages that are not ranking well in Google, but getting valuable, good converting traffic via other sources. Content matches the brand or website obviously so try to get to rank those pages in the organic search results.
  4. Look for pages that are visible, have great onpage metrics but no top 3 positions: get more external or internal links pointing to that page.
  5. Pages with low amount of internal links, but a lot of external links and having top 10 positions. Add internal links, additional content and move towards to the top 3 positions!
  6. Create an overview of pages that are ranking well, but not having a great click through rate. Optimize your meta data (title + meta description elements) to increase the CTR. If it really doesn’t improve your results, ask yourself if your websites is expected to be ranked in those positions for the specific queries. Great rankings doesn’t automatically mean users are expecting or clicking your website.
  7. Create a list of URLs that are visible in search results and getting impressions (WMT data) but are not having any real content targeting a specific query space. Optimize the on page content and your site will benefit from it in most cases.

Deep dive into the data

  1. Similar to high bounce rates, look for pages with good traffic numbers and bad conversion rates. Try testing a new layout, different content or different pricing for example.
  2. Next to conversion rates, have a look at the average value per user. Prioritising can be done according to the best performance page in terms of revenue or the average number of conversions per unique user.
  3. Fighting the Panda: does your website has a lot of pages that don’t receive any organic traffic at all? During the past years, many websites started a blog, pushing live content weekly or even daily without doing any research about the target group, query spaces to target or setting goals per piece of content published. This resulted in a great amount of domains getting filtered out of the organic rankings because of the Panda algorithm within Google. If the majority of your pages is not receiving any traffic at all, start removing old, irrelevant pages.
  4. Try clustering your pages in themes. For e-commerce platform you can sum up metrics for specific product categories for example. Are there any specific folders or clusters of pages that are relevant to eachother not receiving any traffic? This could be a sign of a link penalty or Google not finding your website relevant enough for that specific query space.
  5. Pages that perform well (good conversion rate) in AdWords but not ranking yet in the organic results? Push them harder.

For the more advanced SEO / Excel geniuses: you’ll be able to write some custom made filters and macros to automatically filter out all the pages that fit the questions above. Hopefully this makes your life as a SEO consultant more easy and productive. One sheet, with all the data you need to answer a broad set of questions. Any recommendations of insights you get out of the data presented?

About Jan-Willem Bobbink

Jan-Willem Bobbink got addicted with online marketing in 2004, since he build his first international webshop when he was 16. He is currently working as Freelance SEO for global clients and is ambassador for Majestic. His blog can be found at Notprovided.eu. and he shares his cycling adventures at CATW!

  • aj

    amazing stuff

  • I love the idea behind this but it is too Windows centric to be of any use to me (I jump between OSx and Ubuntu)

    At some point I will find tools that do the same thing but for Mac. I might turn it into a blog post, with heavy references to this one of course!

    • Serious SEOs use Windows, Toby. Only posers use Macs. 😉

    • Serious SEOs use Windows, Toby. Only posers use Macs. 😉

      • And only fools judge working environments before spending any real time in them. You are no fool so I can only assume you have tried being an SEO on a Mac 😉

        • I might still be a fool, Toby! 😉

          • Carlos

            Only newbie SEOs use only windows. Try to crawl a very big site with windows. Be linux my friend.

          • That’s why you have enterprise crawlers like DeepCrawl, my friend. Be careful who you accuse of being a newbie. 😉

      • Dawn Anderson

        Or both 😉

    • I’ve never touched a Mac in my life so can’t share any tips for Tools that can be used with Mac OS. Since there are still to many SEOs using the fancy looking Apple products, you would help a lot of people with it.

      For now, maybe try the following tools:

      Crawling: http://wummel.github.io/linkchecker/ or http://www.drk.com.ar/spider.php

      Onpage & social metrics: http://netpeak.us/soft/netpeak-checker/ but not working with Majestic but with Moz for links.

      • Thanks for the links! That will give me a good start 🙂

        • Mike

          I use SEO Powersuite’s tools on my Mac. When they introduce new features, it might take time for them to work alright on Mac, but their support responds to all questions.

  • Ah, Barry… yer a man after me own heart! 😉

  • Good stuff, Jan! I know it took some time to put that together, and it’s appreciated!

  • You can also use Visual SEO Studio http://visual-seo.com/. It’s free and uses courtesy delay to not overload websites.