My Adventures between the Sheets – Using XPATHs, API Requests and Plugins to Help Inform and Monitor Strategy

My Adventures between the Sheets – Using XPATHs, API Requests and Plugins to Help Inform and Monitor Strategy

15th March 2018

I have been deeply and fanatically involved with Google Sheets of late. It’s a tumultuous affair, but despite the ups and downs, the occasional triumphs are worth the effort. The reason for my present devotion is that while the strategies we build for clients at Click are always driven by research, experience and data, all of that is really hard and takes a lot of time – so I began to do additional work in an effort to do less work. I think the majority of people out there will understand that motivation.

There is never, no matter how many people you have working on a project or how much you invest, enough time to do everything that needs to be done. There will always be projects and ideas that have to be pushed back or go untried. What we can do to help that is to try to save time where we can. So I turned to Google Sheets.

Why Google Sheets? Mostly because of a single Google Analytics plug-in which powered a recent blog I wrote for SEMRush on automating analytics reporting.

What was I hoping to achieve?

While I’m not quite there yet, the overall aim of the present adventure is to give me an easier way to monitor content performance and social growth without having to access each of the 17 individual sets of 20 goals we currently monitor for the Click site as well as the various social media native analytics platforms.

This, for me at least – I am an arts graduate, after all – has been a bit of an uphill slog (especially as one of my ideas was rendered impossible by LinkedIn and G+ joining Twitter in deprecating their share counts). However, the process began by outlining a wish list which would give me at least a start in judging the success of content on the Click blog.

  1. A full list of blog posts
  2. A count of total sessions per blog
  3. A count of social sessions per blog
  4. A count of sessions per topic
  5. A count of social sessions per topic
  6. A count of available share totals for each blog
  7. A follower count per social platform
  8. A Domain Authority score
  9. A Domain Rating score

A full list of blog posts/A count of sessions

The reason I wanted this, if not immediately apparent, is firstly because I wanted to be able to judge all of the blog posts against each other, secondly it would be a data source for some of the other reports.

To pull this data into Google Sheets, I installed (well, I had already installed) the Google Analytics Add-On, gave it access to the analytics account and created a report (using the wizard) using sessions as the metric, and page title as the dimension. I then filtered the results to the blog sub-folder and upped the limit on results returned.

The vast majority of the initial set up for this is done through the wizard (select ‘Add-ons’, ‘Google Analytics’, then ‘Create Report’) which has a search function to help you locate the right metrics and dimensions. The only thing I needed to add was the filter.

  • View ID: This comes from your analytics account and will pull through when you select a view in the wizard.
  • Start & End Dates: Fairly self-explanatory, these set the beginning and finish dates your report will be pulling data from.
  • Metrics&Dimensions: Again, you can add these through the wizard – but there’s a full list here if you need to add anything later.
  • Filter: this was the only thing I needed to add manually and limits the data retrieved to the /blog/ subfolder.

A count of social sessions per blog

Essentially I went through the same process here as for the previous, simply adding ‘ga:source’ as a secondary dimension. The main reason for this is that the report splits all sessions by source which fragments the sessions, making it (I thought) easier to pull overall sessions from the first report and the social sessions from the second.

As it turns out, the sessions report also includes visits through a variety of amended URLs, so it was necessary to create a simple pivot table from the data and, in a summary sheet, use a SUMIF formula which required filling each cell with the formula (if anyone can tell me how I can dynamically insert the value of one cell into a formula in another – please let me know in the comments and you will have my eternal gratitude).

The formula I eventually used was: =SUMIF(‘name of pivot table’!A-:A-. “*blog title*”, ‘name of pivot table’!B-:B-) which checks the title text column in the pivot table for a section of the title (or full title) and adds together all session counts which match the title.

This ended up being the approach I took with social sessions, too – with the added bonus of being able to filter out any non-social traffic in the pivot table.

A count of sessions per topic

This count uses the same formula as I ended up using for the previous two. As a full service search agency, Click attempts to offer content for each of the search marketing skill sets, so essentially, in a separate summary sheet, I set up a set of counts for each variation of the terms we use and a total for the combination (so the totals for PPC, paid search, paid social etc. were added together, as were the totals for organic search, SEO and on for each service).

A count of social sessions per topic

Exactly the same process here, except using the sessions count from the social column in the summary row. By this point, I felt almost like I was getting the hang of it. I know that, at some point, I’ll be using Data Studio to visualise the results, but even in Google Sheets, I can get a reasonably quick overview of what is doing what and where.

A count of available share totals for each blog

Thankfully LinkedIn made the announcement that they were deprecating their share count before I started trying to work out how to pull in this data. While it’s far from ideal in terms of measuring content performance, and there are bound to be ways to work through API requests for some or even by scraping for individual URLs, I decided to put this particular endeavour to one side.

A follower count per social platform

As I can’t measure social shares directly, at least without being far better at coding than I am, I moved on to a more oblique measure of social performance – follower counts. There is an in-depth account of the method for each platform here includes access to a copy of the sheet to take the codes from, but suffice to say macros and I were not friends for a time.

The vast majority of it was a case of using IMPORTXMLs to scrape follower counts from the various XPATHs on the social platforms (except LinkedIn, who, by this point, just seem to be being difficult for the sake of it).

The main difficulty with the rest was finding a good XPATH tool which I eventually found here, in a tool called SelectorGadget, and working around the main Twitter site, which was eventually done using a sub site I had no idea existed, but which does.

This meant that I could write a CONCAT formula (=CONCAT(“”,E3)) which ads the Twiter handle to the sub-domain location. The follower count is then pulled in using an IMPORTXML which eventually looked more like a recording of a keyboard cleaning than anything comprehensive to the human brain: =IMPORTXML(F3, “//dl[(((count(preceding-sibling::*) + 1) = 1) and parent::*)]//*[contains(concat( ‘ ‘, @class, ‘ ‘ ), concat( ‘ ‘, ‘alternate-context’, ‘ ‘ ))]”).

The rest of the IMPORTXMLs were slightly less difficult, but equally long winded (if I ever get around to cleaning all of these various sheets up, I’ll make them available if there’s a demand, but there’s the link before to snatch the formulae for social followers).

A domain authority score

I wanted to use the Moz DA as a measure of progress, but after reading through pages of stuff on the API, I knew less than when I started, about everything, my brain melted. So I sadly had to strike this. I’m sure there are ways and means, and Rand Fishkin has a dashboard available on Klipfolio which caused me to sign up for a free trial, but I couldn’t make hide nor hair of that either (sorry for not responding to your emails Dylan from Klipfolio, it is a cause of great shame). After feeling good about my progress with follower counts, therefore, I was satisfactorily re-grounded by DA.

A domain rating score (thank you Ahrefs API requests)

A thousand blessings on whoever designed the Ahrefs API documentation. This, though it took an hour or so to get my head around, was so easy to use for what I wanted that I’m looking in to seeing if I can use any more of the information available to me, and I expanded it a little so I could track the competition also.

The way this works is I set a column up for website, another which used CONCAT to add the address to the first half of the API request (which is the token that gives the request access to your Ahrefs account) =CONCAT(“***AHREFS PRODUCT KEY***=”,A2) and a third which added the actual request to the end =CONCAT(B2,”&limit=1000&output=xml&from=domain_rating&mode=exact”).

Following this, the fourth column triggered the API request, using =IMPORTDATA – this returns a set of XML code, but thankfully, the bottom cell was the DR, ready to be pulled in to a fifth column before I quickly hid all the workings and made it look reasonable. I’m yet to sort the macro to store historic data, but should be able to use an amended version of the one used in the social example linked to above.

What next?

Firstly, I have to try to tidy all of this up and get it into a workable format so that it’s comprehensible to someone that hasn’t spent quite such a long time in its company. After that, there are numerous other things I really want to try to pull in. Some of them really increase the necessity for me to be able to understand more than just the absolute basics of JavaScript, many of them will take time.

However, the hope is that pulling this data through to a central report (probably using Data Studio) will allow me to monitor blog performance, social performance and more, allow me to identify themes that perform well, the success of various topics during different points of the year, and on and on. While a lot of this data is available to me on the various platforms, it’s the possibility of having it all in one place that I hope will allow me to better judge and compare – allowing me to use performance data and a number of other metrics and dimensions to drive strategy and monitor results.

Is there anything State of Digital readers would have done differently (if there’s an easier way, I need to know)? Or do you have any examples of particular successes in the face of data gathering? Let me know!


Written By
John is an internal marketer at Click Consult where he spends his time accruing industry certifications, tinkering with code, plotting strategy and writing articles on all aspects of search marketing. He also contributes to free quarterly search magazine 'Go Viral' and is the occasional host of the podcast (available...
  • This field is for validation purposes and should be left unchanged.