Measuring the Effectiveness of Site Architecture Improvements

The benefits of a lean, well-organised site architecture are obvious from both an SEO and user perspective.

A recent post by Cyrus Shepard did a great job of outlining the actions you can take to improve site architecture and the potential impact that this can have on traffic and conversions. In this post, Cyrus concluded that with these site architecture optimisation techniques “you can test, measure, and change it over time” and that “it often takes more than one attempt to boost SEO performance”.

Seeing as site architecture optimisation is an iterative process, I’d like to propose a process for measuring and monitoring the impact of these changes by looking at more than just site traffic.

What’s the aim?

Before we dive into the measurement process itself, let’s establish exactly what we want to achieve by improving site architecture.

Ultimately, we are aiming to construct the simplest possible site with the highest proportion of traffic-driving pages. This means that users and search engines access a site’s preferred, high value pages while minimising the visibility of low value, junk pages.

In order to track our progress towards this goal, we need to:

  • 1. Compile the most complete picture possible of each page on the site
  • 2. Decide on our preferred and non-preferred pages
  • 3. Categorise pages based on their visibility
  • 4. Implement optimisation initiatives accordingly and repeat the process

Getting a complete picture

The first stage in this process is about discovering every single page on the site you’re optimising and pulling in as much information as possible for each URL.

I’ve written about this in a previous post, but the core concept involves crawling your site, and integrating a variety of different data sources that will give you the fullest picture of how each page is performing.

Some of the key data sources you’ll want to include are:

  • Organic search performance data via Google Search Console
  • Web analytics data via Google Analytics or Adobe Analytics
  • Backlink metrics via Majestic
  • Googlebot data via log file analysers such as Splunk or Logz.io
  • You can also manually upload additional sources like sitemaps or URL lists

DeepCrawl Search Universe

Separating high and low quality pages

Once you know the full extent of the site’s URL inventory you can start categorising pages based on the value they provide. A crude way of separating out a site’s pages is by categorising them as preferred or not preferred.

Preferred pages are those that you want search engines to crawl and index and that users should visit. As such, you want to take steps to increase the visibility of these pages.

Conversely, non-preferred pages include all other pages on the site. Some of these pages will need to be removed or steps may need to be taken to prevent them from being crawled and indexed by search engines. Examples of non-preferred pages could include parameter URLs, duplicate pages and legacy content.

Separating out pages in this way will give you a clear way of identifying which pages should be given more prominence and which should be given less visibility or removed entirely.

Categorising based on visibility

This next step is the key part of tracking the progress of site architecture optimisation initiatives and involves categorising pages based on search engine and user visibility.

Think of optimising your preferred pages as part of a cycle in which five criteria need to be met. The aim should be to make preferred pages:

  • Valid – Returning a page with a 200 response code.
  • Indexable/strong> – Able to be indexed by search engines.
  • Primary – Either featuring unique content or signals indicating it is the canonical version.
  • Discovered – Receiving requests from search engine bots available via log files.
  • Visited – Receiving traffic from site visitors.

site architecture optimisation cycle

If your preferred pages are not able to pass through each of these stages, then you will need to diagnose where the blockers lie and how you can get them through to the next stages.

Equally, non-preferred pages should also be investigated if they are travelling too far through the stages in the diagram above e.g. implementing canonical tags to point to the preferred version of a page or noindexing a low quality page that should not be shown by search engines.

Let’s take a look at each of these stages in a bit more detail, including some of the issues that you’ll want to look out for.

Make it valid

Goal: To make preferred URLs valid 200 pages & to take junk URLs out of action.

What pages should I flag?

  • Preferred pages returning non-200 status codes.
  • Non-preferred pages returning 200 status codes.

What should I be looking for?

  • 301 Redirects & non-301 redirects
  • Malformed URLs
  • Pages with a high fetch time
  • Non-HTML pages
  • Excessively long URLs
  • Failed URLs
  • Broken pages (4xx Errors)
  • Unauthorised pages
  • 5xx errors

Make it indexable

Goal: To make preferred URLs indexable and prevent non-preferred URLs from being indexed.

What pages should I flag?

  • Preferred pages that aren’t indexable.
  • Non-preferred pages that are indexable.

What should I be looking for?

  • Disallowed pages
  • Noindexed pages
  • Canonicalized pages
  • Paginated 2+ pages
  • HSTS canonicalized
  • Mobile alternates

Make it primary

Goal: To make preferred URLs the primary page or unique while removing signals identifying non-preferred pages as primary or unique.

What pages should I flag?

  • Preferred pages that aren’t unique or primary.
  • Non-preferred pages are unique or primary.

What should I be looking for?

  • Duplicate page sets
  • Duplicate title sets
  • Duplicate description sets
  • Duplicate body sets
  • Missing titles
  • Short titles
  • Missing descriptions

Make it discovered

Goal: To get preferred URLs crawled by search engines and minimise wasting crawl budget on non-preferred URLs.

What pages should I flag?

  • Preferred pages that aren’t receiving Googlebot requests.
  • Non-preferred pages that are receiving Googlebot requests.

What should I be looking for?

  • Indexable pages without Googlebot hits
  • Indexable pages without search impressions
  • Pages without backlinks
  • Non-indexable pages with backlinks
  • Orphaned pages
  • Nofollowed pages

Make it visited

Goal: To get preferred URLs receiving visits from organic search and minimising visits to non-preferred URLs.

What pages should I flag?

  • Preferred pages that aren’t receiving visits.
  • Non-preferred pages that are receiving visits.

What should I be looking for?

  • Pages that drive traffic
  • Mobile pages driving traffic
  • Desktop pages driving traffic
  • Broken pages with traffic
  • Redirecting pages with traffic
  • Disallowed URLs with traffic
  • Non-indexable pages with traffic
  • Non-indexable pages with search impressions

Rinse and repeat

In summary, the process I’ve outlined will enable you to construct a complete view of your site’s URLs, so you can categorise your site into high and low value pages. This will then allow you to further categorise your pages into 5 stages so you can look at implementing initiatives that will result in the site having a larger proportion of high-quality, traffic driving pages.

Hopefully this process can help you more effectively monitor the progress of site architecture optimisation initiatives. The key is to repeat the process regularly so you can keep a close eye on performance fluctuations beyond traffic, and so that you can quickly diagnose and resolve new issues as they arise.

Sam Marsden

About Sam Marsden

Sam Marsden is SEO & Content Manager at DeepCrawl and writer on all things SEO. DeepCrawl is the world’s most comprehensive website crawler, providing clients with a complete overview of their websites’ technical health.