Cleaning up your Google Analytics traffic sources
Analytics

Cleaning up your Google Analytics traffic sources

29th April 2020

We rely on Google Analytics to prove our worth as marketers. We use it to identify problems and trends. It’s the way we measure progress. It’s also often wrong.

Google Analytics needs some fine-tuning to ensure it is accurately tracking your data. Over time issue creep in, polluting data and causing insight to become muddied.

It is important to regularly audit your Google Analytics account to identify if any traffic sources are being categorised incorrectly or over-reported.

How to audit your traffic

The best first step in auditing your Google Analytics traffic sources is by going to “Acquisition” > “All Traffic” > “Source/Medium”.

Source/Medium report in Google Analytics

Set your time period to a decent length of time, either from the date of your last Google Analytics traffic audit or at least 6 months.

Take a look at your sources and mediums, this is the starting point of your audit.

Wrongly categorised traffic in Google Analytics

There’s no scientific method to what we’re about to do. Essentially, you need to look through your source and medium categories and look to see if anything does not seem right to you. For instance, in the screenshot above you can see there are two sources labelled as “referral”; getbottraffic4free.xyz and t.co.

Getbottraffic4free.xyz is a spam bot sending traffic to the website. It is not genuine traffic and therefore will be polluting the data available in Google Analytics.

T.co is actually traffic from Twitter which has clicked on a link which uses the platform’s URL shortener.

Both of these examples show data issues that would need cleaning up in this Google Analytics account.

Common data issues in Google Analytics

There are several common problems that are often seen in Google Analytics accounts that have been left to run for a period of time without auditing.

Miscategorised traffic

Google Analytics identifies the source of traffic coming to a website through the referral source information which is transferred by the browser when the visitor arrives at the website. Sometimes Google Analytics gets this wrong.

As a result, we see traffic being categorised as “referral” when in fact is would be better categorised as “social” or “affiliate”.

Misused UTM parameters

UTM parameters, or “urchin tracking modules” are parameters that can be added to a link to provide referral source data.

For example, the following website “example.com” wants to be able to identify when a visitor arrives on its site by clicking on a Google My Business listing link. Google Analytics will natively report anyone clicking on a Google My Business listing link as “google / organic”, however it will not specify that it was a Google My Business listing that was the source of the traffic. The website could use the following URL on its Google My Business listing to better classify that traffic within Google Analytics:

https://example.com?utm_source=google-my-business&utm_medium=organic.

Traffic that clicked on this link would be tracked as “google my business / organic”.

They are extremely helpful in enabling Google Analytics to accurately report the source of traffic coming to a site when it couldn’t natively acquire that data.

They are also very good at corrupting traffic data in Google Analytics if used incorrectly.

UTM links on internal pages

For instance, if a UTM parameter is added to a website’s link that is pointing to another page within that same website. Any traffic that arrives on that site and clicks the link with the UTM parameter on it will stop being reported under the traffic source that originally brought it to the website and will continue its journey through the website being reported as whatever the UTM on the internal link specifies it should be tracked as.

For instance, if a website visitor arrives on the website from conducting a Google search and clicking on an organic listing it would be categorised as “google / organic”. However, if the website was trying to track how many visitors were clicking on their offer banner on the home page and so added a UTM to the home page offer banner URL like “https://example.com?utm_source=home-page&utm_medium=banner&utm_campaign=homepage-offers”, any traffic that clicked on it would register as a new session in Google Analytics with the source/medium “home page / banner”.

The visitor who arrived from the Google search’s session would end on the home page and would register as a new session with “home page / banner” as the source of traffic. This means if they happened to purchase from the page that contained the offers linked to from the home page, the sale would be attributed to “banner” and not “organic”. The true effectiveness of SEO activity for that site would be masked.

UTM links without the medium set

In the earlier example for tagging a Google My Business listing with a UTM I recommended including both a “source” and “medium” parameter. This should always be the case at a minimum, although additional parameters can also be included (see Google’s own UTM builder for more details).

The risk is that Google Analytics will use a UTM parameter to replace data it previously could have pulled from the referral information should no UTM have been present.

In the example of the Google My Business tagging if you do not include the “medium” Google Analytics would not use the “organic” medium it would previously have used for a visitor from Google My Business listings. It will just omit the medium altogether. Therefore, in Google Analytics, the “source / medium” would look like “Google My Business / (not set)”.

Bot traffic

Another cause of polluted data in Google Analytics is bot traffic. Bot traffic can often be diagnosed when you see high levels of traffic over one or two days that does not exhibit human behaviour. For instance, does this traffic all come from one source or medium but bounce right away? Is the time on page 0:00? Does it have “not set” for the location of the visit?

Any sudden, unexpected increase in traffic from one particular source should be investigated. It might be genuine traffic, but if it is not visiting other pages, or interacting with the site, then chances are it is bot traffic.

You may also notice a lot of traffic coming from stranger sites in your “Referrals” report. The website name might be spelled incorrectly, of an adult nature (which is unrelated to your industry) or obviously a spam site “get-free-seo-links”.

Self-referrals

Another cause of inaccurate data in Google Analytics is self-referrals. This is when you notice that one of the referral sources in your “referral” channel is your own website.

There can be several reasons for this, namely, you are not tracking everything correctly on your website. For instance:

  • There isn’t a Google Analytics code on every page, therefore some pages aren’t be categorised as part of the website for the purposes of Google Analytics. Page A contains the code, Page B does not. A visitor goes from Page A to Page B and back to Page A again. Page B will register as a referral source.
  • You have not set up cross-domain tracking correctly. If you are trying to track visits across two domains that you own, example.com and exampleshop.com, but the domains are not added on to the Google Analytics “referral exclusion” list then Google Analytics will track them as separate referral sources.

Internal traffic

Depending on how often your staff are using your website, their traffic might be polluting your data. For instance, if your customer service staff frequently refer to your website’s “FAQ” section to answer customer support questions then every visit of theirs to those pages may be reported in Google Analytics. This could lead you to believe your FAQ pages are more popular with customers and prospective customers than they actually are.

Similarly, if your staff often receive phone calls asking for technical specifications of products and they go to your website to find the answers on the product pages then it could lead you to believe that the product pages have a low conversion rate. This can get more confusing and potentially inaccurate when staff use external search engines, or even click on PPC ads, to arrive at the product pages.

Fixes

If you notice that there are issues with the way your Google Analytics account is attributing traffic then don’t worry. There are fixes available to ensure it tracks correctly going forwards.

It is important to remember that any changes you make to the way traffic is categorised in Google Analytics will make it less accurate to compare data before the fix with after the fix. Make sure you annotate your Google Analytics view to show when and what changes were implemented to improve the accuracy of data.

Fixing wrongly categorised traffic

Traffic that is being miscategorised by Google Analytics natively can be fixed in two ways.

Incorrectly attributed organic search traffic

If there is traffic displaying in the “Acquisition” > “All traffic” > “Referrals” report that you are certain came from a search engine you can use Google Analytic’s “Organic Search Sources” to reclassify that traffic as coming from an organic medium going forward.

Before you do this, make sure the traffic is definitely coming from the organic search listings. Just because traffic is coming from “Google” does not mean it is necessarily organic traffic. For instance, you can often see “google.com” as a referral source. This might well be visitors from other Google products such as Google Docs. Reclassifying this referral source as an organic search visit would be wrong.

If you are certain that the referral source should be classed as organic traffic then follow Google’s guidelines on using the “Organic Search Sources” function. Something key to note – the order of your search engines listed in the Organic Search Sources may impact how different sources from the same search engine are classified. For instance, Google’s guidelines mention that if both their image search and standard organic search results use the same query parameters then you may end up accidentally classifying all image search results traffic as standard organic results traffic.

“To change this attribution, you can reorder these search engines in the list to prioritize how sessions are attributed. In this example, you could list images.google.com before google.com so searches are properly attributed.”

Other incorrectly attributed traffic

If traffic that is arriving at your site has a source/medium that is not supposed to be [search engine]/organic and therefore cannot be fixed using the “Organic Search Sources” fix then you may need to use filters.

Filters permanently change the way data is reported in Google Analytics. Set up a test view in your account to check that the filter is behaving as you would expect. Then use it on your reporting view.

A common example of when filters need to be used to reclassify traffic is for social media platforms. Twitter, Facebook and LinkedIn traffic is often seen in both the “Referral” and “Social” channels. This is due to the fact that these platforms send traffic from a variety of different sub-domains. Google Analytics only recognises a few of them as being social media sources.

For instance, taking a look in your “Referral” source may show the likes of

  • m.facebook.com
  • t.co
  • facebook.com
  • linkedin.com
  • l.facebook.com

This is due to the way the social media platforms handle links out from their site. Facebook uses “link shims”, Twitter uses its link shortener “t.co”, and LinkedIn has its own version too. This is designed both for security of users clicking on the link, and to make it easier to share links. Google Analytics does not automatically recognise traffic clicking on these shortened URLs as having originated from a social media source.

You have to tell it to classify traffic that way.

This can be achieved using filters. It can be a messy process but this guide by Edit takes you through step-by-step, just change the sources and mediums to fit your case.

Fixing badly used UTMs

UTMs used on a website to track data going to other pages on that same site should be removed. If you are looking to track interactions with different elements on a website then use Google Tag Manager instead.

The data lost or changed through using UTM parameters on internal links cannot be recovered. It’s gone forever. Stop using UTMs on your internal links.

If you have UTMs set up incorrectly on external websites, find the links and change them.

If you cannot change them you can use filters to correct the data going forwards. You may have a UTM parameter somewhere in the wild with the medium set as “Affiliate” rather than “affiliate”. This will result in two separate channels in Google Analytics as it is case sensitive. You can use filters to rewrite “Analytics” to “analytics” however, using the same method above.

Excluding bot and referral spam traffic

You may be aware that Google Analytics can filter out most commonly known bots from your data. It is done simply with the check of this box:

Bot filtering checkbox in Google Analytics

Found under “Admin” > “View settings”.

This won’t stop all bots from impacting your data however. If you believe you have determined a referral source is in fact a bot, you can filter it out.

Referral spam filter in Google Analytics

Keep checking over time to see if the filter is working, and add additional filters when new bots arise. If you end up receiving a lot of traffic from spam sites, you might want to follow this excellent Moz post on “Ghost referral spam”.

Excluding self-referrals

Seeing your own website as a referral source? You might not have Google Analytics tracking on every page of your site. Check whatever method you are using to add the code to the site. Is it actually adding it to every new page?

If you have more than one domain, your cross domain tracking might not be set up correctly. Make sure you have followed every step of Google’s guide to setting up cross domain tracking.

Excluding internal traffic

Making sure your own staff’s actions on our website are not affecting your data is simple. Make sure to capture the IP address of your office and any regular remote working spots. Use this information to set up internal traffic filters.

Final note

Remember that with Google Analytics it is always recommended to have a view that has no filtering on it whatsoever. This will help if you notice that you have been accidentally excluding or changing data using filters. Remember, you will not be able to recover that data from the view those filters are applied to. It is also recommended to set up a testing view. This way you can trial filters and other changes on for a period of time. Once you are happy that they are working as expected you can roll out the fixes to your reporting view.

Google Analytics is an amazing source of data about your users and website. Make sure it is as accurate as it can be with regular auditing and clean-ups

Written By
Helen is Managing Director of Arrows Up, an SEO training and consultancy agency. Helen has over ten years of experience in digital marketing, SEO and analytics. She regularly speaks at industry events and loves sharing her knowledge through teaching and writing.
Using SEO to Spot Business Opportunities
Latest Post from Helen
Analytics Using SEO to Spot Business Opportunities
16th June 2020
  • This field is for validation purposes and should be left unchanged.