SEO Forecasting: A Poor Allocation of Resource

As I’m sure we all know, SEO forecasting is one of the most tedious and frustrating tasks that many of us have to cope with for pitches. In my short time at OMD I am quickly learning how important these forecasts are to larger clients and as a result our team have been forced to improve the sophistication of our forecasting models.

When I first spoke to Bas about writing this post my intention was to write a bit of a rant about the cost and time consuming nature of forecasting and as many have already illustrated before “a forecast for SEO is not even worth the paper it is printed on.” However, given our involvement in an ongoing pitch and the subsequent improvements we have made to our model I think it is worth sharing my findings and suggestions and would love to hear what sort of things others in the industry have come up with to cope with the myriad of issues thrown up by this reporting.

So, if you are a technical SEO and you already know how horrible the data is, I urge you to skip to the end of the post and to weigh in on the comments below about some of the techniques and modelling you all have done to make life easier (e.g. Excel hacks, estimation tricks, number of terms use when trying to create a sample, how you deal with the long-tail, etc.). Though you may find some of the statistics and arguments useful should you find yourself needing to forecast!

However, if you are a client, if you have to deal with clients or if you have to sell to clients I strongly urge you to read the below issues with the data most SEOs use for forecasting and also read about the amount of time this sort of work can take – time that we could be spending to make your website better!

Issues with the Data – Mo’ Variables, Mo’ Problems

As most folks who have ever had to do a forecast will know, there are limited places from which to get relevant data and it very much depends on the type of forecasting and budget available to you that determines whether or not this data (flawed as it is) means anything at all. Additional requirements put in place by clients often actually limits the accuracy further though it can save a good bit of time from a researching perspective (i.e. we only want non-brand, we only want long tail, etc.).

Adwords Keyword Tool
When conducting forecasting in new industries or areas where you do not have a great deal of analytics and PPC data we will be forced to use the data from the Google Adwords keyword tool (please note that the API data is generally a bit better). Amongst the SEO community this data is notoriously inaccurate.

For example, in one recent exercise we found that on average the purported exact match volume for keywords being reported by the tool was off by more than 700% (e.g. the tool was reporting a total search volume for a term as 65 and we had seen 400+ visits for that term the previous month). I am sure there are worse examples and the fact of the matter is that we obviously can only rank for one spot – thus the total volume for this term is not only likely to be 400+ but more realistically will be well over 1,000.

This data is notoriously bad but often it is the only data we may have (particularly new or small agencies working for new or small businesses). However, if we are lucky, we are already receiving traffic for some of these terms or can run tests through PPC to try and gauge from where this data is coming and hopefully get more accurate insights.

*I would suggest using Richard Baxter’s keyword tool here as it relies on the volume from the API and allows you to get the volumes for specific terms much more quickly.

The 70/30 split (or was it 80/20? No, 60/40)
The next major issue with the data was very aptly pointed out in a pitch situation after our first attempt at a forecasting model that relied heavily upon the AdWords data. The above data (though incredibly inaccurate) was designed for PPC purposes as Google clearly has no need or incentive to provide SEOs with tools. Therefore, it is probably safe to assume that this data is intended to show all search volume rather than share of the search volume that is going strictly to organic search.

It is essential then that if we are to use the AdWords Keyword Tool data (horrible as it is) that we also need to discount the traffic opportunities through organic search by ~30% (or more). This one is easy enough to overlook but it also adds another element of inaccuracy and uncertainty because this split (and the degree to which it exists) differs horribly and as Pete Young points out: “is due a significant review.” Hence: another layer of uncertainty.

Analytics Segments
One way to improve the reliability of the first set of data (that is to say, volume and potential volume for certain keywords) is to use analytics data for existing traffic to your site for a sample of terms. Sounds great right? If we don’t have to deal with the Keyword tool data we should get much more accurate predictions!

Unfortunately this is not the case. One of the most sensible ways to look at the traffic data would be to create a filter to catch all organic traffic around the term or set of terms in which you are interested. This could save you a significant amount of time against looking at every single term on a case by case basis and would allow you to spit the data out into Excel (much easier to manipulate data here).

The problem with this (particularly for bigger sites over longer time periods) is the increasingly common and unfriendly reminder below:

Samples can be good and accurate though! …

Not so much in this case. I think my point here is beginning to come clear. AdWords tool volumes are off by as much as 50% (in our experience anyway), but can be off by 1,000% or more. Analytics and PPC data whilst more accurate than the other options, still will lead to sampled data when looking at large sites or time periods and +/- 99% accuracy on my forecasting model is not something that is going to sit well.

Click Through Rate (CTR) Data
It is fairly safe to assume that most of your SEOs doing forecasting projections rely quite heavily upon the AOL click through rate data to forecast increases in traffic based upon position in the search engine. If you are lucky, your agency keeps track of a broad set of terms and tries to look at the impact of different rankings and so forth, but almost all of the forecasting models I have seen use this data rather than the in house data (which requires a load of servers, hacks and ongoing attention).

So, what’s the problem?

This click stream data came from a leaked document from AOL regarding the behaviour of roughly 650,000 searchers across 20 million queries over a period of three months (source). This is great data from a massive set but the problem is the landscape has changed a great deal. We don’t have one SERP anymore or one “type” of SERP anymore. As I’ve mentioned in the past the landscape is constantly changing and many results now throw-up local results (including a map), shopping results, etc.

And just to make things even more interesting, it is thought that adding microformats increases CTR as well!

So, it would appear that we have yet another layer of complication and another imprecise variable leading to further muddied waters.

Final Word on the Data (TL;DR)
In the very best case scenario we would be asked to improve rankings on a certain set of terms for which we have both PPC and Organic traffic data. Failing this you would at least be able to pay to run a number of short PPC campaigns to capture impression data for the terms in question (don’t forget to set the phrases to exact match!).

However, even with the most accurate possible data we are dealing with a number of unknown and ever changing variables (CTR, popularity of queries, analytics and user behaviour, etc.) there are a number of factors that can drastically throw off this information. Without proper funding and resource it is time consuming task that will not lead to reliable results (no matter how scientifically we try to treat a decidedly unscientific task).

Issues with the Task Itself

Time is Money
At the end of the day, this sort of research (to be done “well”) requires a great deal of time. Setting aside the fact that my time is valuable and I should be paid for work I do on this sort of thing there is a much more important factor: my aim is to help clients increase their traffic, increase their converting traffic (either through CRO or by conducting smarter keyword research), and help them provide a better experience for their users to help them make more money.

I would love to tell them just how much money I can make them, but for the amount of time forecasting takes me I’d really rather just get to work on actually improving their performance and increasing revenue – we can discuss whether it has been satisfactory after a short time period and most contracts are more forgiving in the first few months.

“Getting Burned”
Every single time I have conducted forecasting at the request of a client I have been told how they have been “burned by this sort of thing in the past.” I am by no means having a go at any individual here because literally every client has said it to me and I believe it is largely the result of SEOs (and sales teams!) promising the world when they cannot really deliver. They can blame the data if they like, but ultimately most times it will be because they have stretched the truth or promised something that they do not genuinely believe they can deliver.

There are a number of causes to this but I urge the utmost caution to all future clients: it is about trust. If it sounds too good to be true, it probably is! I know these points sound a bit cliche but at the end of the day it all comes down to trust. Some of these contracts are worth a great deal of money and it is easy to get overzealous with this sort of thing.

It is my aim to always be honest, to be realistic, but also to give ranges. When doing a forecast I try to provide a number of cases: where you will be if you continue as you are, where you will be if you do nothing, where you will be if you hire us, and where you will be if you actually help us along the way.

Many people provide pie-in-the-sky figures and then blame the client for not implementing their ideas. I would much prefer tell the client that the best results will not come without their cooperation and show them realistic estimations based upon their work with us versus our work without them versus their work with someone else. Timelines do not always permit for such scrutiny but it is my aim to deliver forecasting with these ranges to help paint a clear picture of the opportunity.

Pro-tip to clients and salespeople: do no promise something you cannot deliver and always be scrutinizing of forecasts – even the best forecasts rely on heavily flawed data. The reason clients get burned is either because they believe the overhyped and unrealistic projections OR because they put on pressure to achieve unrealistic results founded upon unrealistic expectations and unreliable numbers.

Keyword Research
For a forecast to be done properly (and as a reasonable demand on the agency or consultant) it really requires in-depth knowledge into your industry. Because of non-competes and other legal issues it is rare that we will have an existing client doing the exact same thing you are in the exact same industry. To get the most out of forecasting you really need to have this knowledge either by way of previous (recent!) work from a trusted source or to conduct the work yourself.

For this very reason I would strongly advise selling a piece of keyword research (perhaps as part of or as a compliment to a Site Audit) before agreeing to undertake an in depth and robust forecasting model. With this data you will be in a much better position to look through the SERPs and you will already know where the biggest opportunities lie.

You Spent How Long?!
Somewhat separate to the above point that I would rather be improving a website and a company’s revenue than working on forecasting models, I think the point remains that to do forecasting well it takes an excessive amount of time (even with existing models).

Some agencies have dedicated sales teams with support staff to produce these sorts of things and treat that as overhead or new business resource. However, at the agencies for whom I have worked this modelling has been handled by some of our most experienced SEOs and/or dedicates statistics and business forecasting professionals.

To do this work properly (including the keyword research if not already done) we are looking at a range of anywhere from 20-60 hours of labour. This is a rather large range because the number of times you have done the forecasting will increase your ability to work quickly, Excel wizardry and specialised support helps but many agencies don’t have this.

If basic forecasting is requested it can obviously be completed in much less time but reaching a model in which you are confident (at least in terms of the terms chosen, the regressions used, and so forth) is a huge piece of work. If this sort of forecasting is required (for the board or whomever else) that is fine but requires compensation and it is no wonder that most forecasting predictions are thin and usually about as good as a finger in the air at predicting results.

Ever Changing Rankings
As we’ve all read ad nauseum, rankings don’t mean what they used to. We’ve got “local”, “seven packs”, “personalised search” and it seems like a hundred other varieties of results pages. There are increasingly fewer “traditional” ranking results and this requires a more holistic approach to SEO as a whole but also truly limits our ability to predict and forecast results. If Google decides to drastically change the layout of the SERPs tomorrow such that only paid advertisements are showing on page one, our assumptions that we could get you to position six with the pertaining traffic are no longer valid.

At the end of the day the work is interesting because things are always changing and a good SEO can stay on top of these changes. But with massive changes to the landscape of the SERPs results may not come overnight and that term for which you moved from eighth position to sixth position may now be on page two!

Final Suggestions to Sales, Clients and People Involved in Pitches
If you are thinking about asking for forecasting or selling in forecasting, I would strongly advise that you either require complete cooperation (with access to all available data) and bill the time accordingly, or, better still, sell the initial work that is likely to improve the quality of this work as a one off project and insist that all forecasting should wait until extensive keyword research is done. Your SEO team will thank you and will likely deliver better results.

Please remember the following when considering forecasting:

SEO forecasting is not accurate due to data limitations and should not be relied upon.
SEO forecasting will be accurate to the extent the data available is accurate (translation: they may have to pay to run some PPC testing or provide you with existing PPC results).
The data will be even worse if you do not provide the agency or team with analytics access.
Good forecasting should be paid for. Our most recent forecasting (which included enlisting the help of our Data Science team) has taken up a minimum of 20 hours. The model we have built is sound, but this time would almost always be better spent consulting in our area of expertise.

The conclusion here is quite obvious, but just to be perfectly clear: SEO forecasting is unreliable by nature of the data. Intelligent estimates can be produced though doing so is immensely time consuming and a poor allocation of resource.

When you are looking for an agency judge them upon how you get on with them, the degree to which you trust them, what others have to say about them and their past results. Anyone can make promises they won’t fulfil, but this is not the way to pick an agency.

Rant Over – Tune in Technical SEOs!

I hate to write a post without providing at least one takeaway tip for SEOs because the bottom line is: clients are still going to request these forecasts. The tip, unfortunately, does very little to help with the accuracy of the data though it has yielded a considerable time saving. If you are looking for basic instruction to predict yearly traffic look here first.

Our Curent Model & Other Techniques
Our model has come from countless hours and attempts at forecasting and the need to reduce time spent on these acitivties. The recent pitches in which we have participated have required a level of attention to detail we’ve not previously seen and as a result we have teamed up with our Data Sciences team (please visit your resident Excel/Stats expert if you have one in your team) to produce a spreadsheet and a few models that can be used again and fairly easily manipulated.

Previously we relied on manually going through the SERPs for as many as 50 keywords and working backwards from where we reasonably thought we could get a site ranking in twelve months time (based upon competitor research, intuition, and experience in the industry). In this model we had to manually enter the progress month on month, multiply this by the volume and CTR data we had available, apply a seasonal impact and estimate traffic growth month-on-month for these terms. We then took an average uplift in traffic (across these terms compared to Year-on-Year data) and applied this average percent uplift and applied it to the rest of the data.

This wasn’t a perfect approach (due to the data issues I mention above) but we felt it was fairly reliable and something we felt offered a reasonable approximation; however, it was exceedingly time consuming!

There were some obvious problems with this model though so we have upped the ante a bit and I would strongly encourage others to look into ways to automate these processes and provide feedback or tips in the comments below.

Upgrade 1: Use Existing Data to Automate
Obviously the manual task of filling out month-by-month ranking improvements is not scalable in the slightest. However, it’s also not reasonable to just assume “oh, we’ll improve 2 positions each month until we reach #1”. In reality, the positions jump around a bit and there is usually a quick uplift at the beginning (more pronounced when you have a strong site creating new content) before things get more difficult – it’s not linear.

There are a number of ways to automate this and I strongly suggest enlisting the help of someone with mathematics and excel expertise. Once you have created regressions for the improvement anticipated over twelve months you can automate the process such that you enter the current rank and the rank that you expect in twelve months time and Excel can return the rankings for the intermediary 10 months as to how you get there. We have achieved this automation using coefficients and logs based upon where we are now and where we expect we will be in 12 months as a result of our consultation with the Data Science team and used our previous manual method to help sense check the results.

This will not perfectly mimic improvements because there tends to be a fair bit of bouncing around and usually ranking improvements are inconsistent, though we have found the data to correlate well with the previous method. This has helped save an extraordinary amount of time going forward and once created can be used over and over again.

In an ideal world these spreadsheets could be created to account for different types of SERPs (e.g. enter “3” into a cell for one type of curve, “2” for another) and so forth and I have no doubt that this can be taken a great deal further but the level of analysis required to do this truly would require a dedicated effort with a great deal of resource.

One of the areas that we are still working on automating a bit better and I would love to see advice or tips from others deal with: accounting for the long tail in these predictions. This is an obvious area of difficulty because a site with a lot of authority can produce new and meaningful content and usually rank quite easily for long tail terms. However, forecasting on this area remains extremely difficult due to the knock on effect of other improvements across the site as well as the lack of search volume data for long tail queries.

One obvious issue with using the regression model is that it assumes all keywords will be weighted the same and it is unrealistic to try to optimise for a number of terms all at the exact same time (onsite and offsite). However, if you create groups of keywords based upon priority level and staggered the start dates for improvement (reflecting more modest targets if they won’t be targeted until month 9 “where could we be in 3 months for this term”) or by fitting improvement to more gradual curves.

Finally, as I mentioned earlier, one nice filter that we have added (for the cases where we are forced to rely on the AdWords data) is simply a column that allows you to tweak the PPC/SEO split (dependent upon number of ads served in that SERP, etc.).

Essentially, once you have set up this spreadsheet it is considerably easier to adjust variables (by changing your estimated final ranking position) based upon an Optimistic, Pessimistic, and Independent outcome and adjust for changes to ads served for a given term, priority levels, etc.

Pro-Tip: Providing different layers of predictions allows you to illustrate the importance of working WITH the client and that with their participation your results can be considerably stronger. Showing them the difference between where they would be without you, where they’ll be if they leave it up to you, and where they’ll be if they work with you and implement your suggestions can be a very persuasive argument and help ensure things get done!

Bonus Tip to deal with CTR Accuracy
As mentioned above, there are serious concerns dealing with the CTR Accuracy. Although we discussed ways to potentially increase the accuracy of the search volume data (by running PPC campaigns for sample terms and by using existing Analytics data when not based on samples) this still doesn’t account for the lack of accuracy with the CTR data (based on 2006).

As mentioned, the best possible way to deal with this would probably to monitor the SERPs for quite some time and do your own testing based upon type of result and dependent upon the position in which you are ranking (using full referral strings or other shadier methods).

However, one more accessible way to cope with this outdated CTR data was provided to me by Sarah Carling:

I think this tip is probably most useful amongst rankings outside of the top 3, but it is an interesting alternative to relying upon old data.

CTR accuracy will still always be limited and it is important to take into consideration the type of results shown for a given query will have a significant impact. If you cannot rank in Google Places, (as a result of a lack of a retail location in that area) it is worth being realistic about the number of organic listings available for a given term and evaluate if you are truly likely to jump onto the first page for these terms before promising the traffic from them. With the increasingly geo-targeted and geo-sensitive nature of results this may cause issues for a number of online businesses, but it’s important to be realistic!

Example: in the above SERP; ranking “5th” for the term “gym london” in the natural listings (i.e. not Places) is not going to give you anywhere near 4.86% click through anymore, it may not even get you to the first page, which means we’re looking at more like <.66% CTR.

The moral of the story here is that you have to be realistic and that good forecasting will take time (to give an accurate estimate of where you will rank you will probably have to look at each SERP manually to see the type of result displayed). There are ways to automate and some ways to improve data accuracy, but the task is still incredibly demanding.

We will continue to be asked to forecast, but the story remains the same: doing it well takes time, time is money, and the forecast is only as valuable as the data (not very). Either forecasting will become increasingly expensive or it will continue to yield unreliable results. The best advice I can give to fellow SEOs is to continue to push back on this, cite the above reasons and in the meantime to automate as much of the process as possible – oh, and make sure you learn Excel or hire someone who knows it inside and out.