A Crawl-Centred Approach to Content Auditing
Content Marketing

A Crawl-Centred Approach to Content Auditing

28th February 2018

It’s 2018 and content marketing is still on the rise in terms of interest and in the revenue it generates. 75% of companies increased their content marketing budget in 2016 and 88% of B2B sites are now using content marketing in various forms.

Measuring ROI and content effectiveness are seen by B2B companies as more of a challenge than a lack of budget. Generally speaking, the issues now faced with content marketing appear to be less about buy-in and more about showing the value of content marketing initiatives.

Challenges for B2B Content Marketers

Increased Need for Content Audits

The longer companies invest in content marketing initiatives, the larger their sites are likely to become, meaning that the need to review and audit content is more important than ever.

I’m going to put forward a crawl-centred process for reviewing content that will give you a structure for answering these four questions:

  1. What content is/isn’t working?
  2. What should you do with content that isn’t working well?
  3. How can you get the most out of content that is working well?
  4. How can you find insights that will inform your content strategy?

Answering these questions will put you in a position to optimise and maintain site content as well as assisting you in implementing a data-driven approach to ensure content marketing resources are being invested efficiently.

Content Discovery

The first phase of a content audit is the discovery phase. This involves finding all of the content on your site and combining it with a range of relevant data sources that will inform you about the performance of each page.

Starting with a crawl

While there are many ways you can achieve this, the simplest way is by starting with a crawl of the site. Using a crawler, like DeepCrawl, you can crawl sites of any size and you have the ability to easily bring in a whole host of additional data sources to find pages that a simple web crawl might miss.

Integrating Additional Data Sources

The graphic below details some of the data sources you will want to bring in alongside a crawl, but don’t treat this as an extensive list. You should look to include any data sources and metrics that are going to be useful in helping you assess the performance of your content which may also include: social shares, sitemaps, estimated search volume, SERP data etc.

DeepCrawl Search Universe

The beauty of using a crawler like DeepCrawl is that it will save you the effort of having to pull the majority of these data sources together manually…and crashing excel. Once you’ve run a crawl with other data sources included, you can simply export the full set of data into a spreadsheet with the pages occupying rows and the metrics as columns.

Using Custom Extractions

Another benefit of using a more advanced crawler is that you can use custom extractions to pull out data like: author name, out of stock items, published date, last modified date, meta keywords, breadcrumbs and structured data.

DeepCrawl Custom Extraction

The Refining Phase:

At this point you’ll need to take that bloated spreadsheet you’ve got, all full with performance insights, and shrink it into something more manageable. The aim of this phase is to reduce the data down so that you’re in a position to start assessing and reviewing pages.

This involves removing pages (rows) from the spreadsheet that sit outside of the content audit and getting rid of superfluous metrics (columns) which aren’t going to provide you with valuable insights. If you aren’t sure whether a metric or page should be included you can always hide these rather than deleting them.

column deletion

Now that you’ve got a workable dataset, let’s see how you can go about answering those four questions.

What content on the site is/isn’t working?

To understand content performance you will want to decide on a set of criteria that define success and failure. The metrics you choose will be dependent on the goal of your content and what you’re trying to achieve, but will likely cover traffic, social shares, engagement, conversions or a mixture of some or all of these.

Once you’ve made this decision, you can define buckets based on varying levels of those metrics e.g. outstanding, good, average, poor and apply them to the pages you’re assessing.

Now you will able to see how content is performing based on the metrics you care about.

How can you deal with content that isn’t performing well?

Now that your awesome and poor performing pages are right there in front of you, you’ll need to need to decide on how to deal with them respectively.

I’d start by adding in an ‘Action’ column to your spreadsheet and creating a dropdown list of different decisions. These could include:

  • Keep – Pages that are performing well and will not be changed significantly
  • Cut – Low value pages that don’t deserve a place on your site e.g. outdated content
  • Combine – Pages that include content that doesn’t warrant its own dedicated page but can be used to bolster another existing page
  • Convert – Pages with potential that you want to invest time improving e.g. partially duplicate content

Column actions

For small to medium sized sites you should be able to make these decisions on a page by page basis, but for larger sites it may be easier to make decisions by aggregating pages into groupings, so that it remains a manageable process.

What actions can you take to get the most out of content that is performing well?

Now that you’ve decided what actions you’re going to take for each of your pages, you’ll want to filter down your pages by those that you’re going to keep, and look at ways that you can get the most out of them.

This will take the form of an exercise in content optimisation and involves tuning up the content you want to keep. There are a ton of great resources which cover this subject so I won’t cover this in detail. However, you may want to look at how you can improve:

Optimising titles & meta descriptions

– Bread and butter stuff, but are your titles and descriptions appealing propositions? Do they match the user intent of the queries that they rank for?

Keyword cannibalisation

– Do you have multiple pages targeting ranking for topically similar queries that can be consolidated to maximise your authority on this subject?

Resolving content duplication issues

– Is content on your site unique? Are there near or true (complete) duplicate versions which could be diluting the authority of the page(s) that is performing well?


– Are there opportunities to link to related pages internally or externally? Do you have relevant CTAs? What do you want visitors to do once they’ve finished with the page they’re on (and do you help get them there)?

Page speed

– Are there any ways you can further optimise pages to reduce load time e.g. image optimisation or clunky code?

Structured data implementation

– Is there any useful structured data that you could use to mark up pages, and is existing markup implemented correctly?

How can you enhance your content strategy?

Once you’ve made it to this stage you will know how you’re going to deal with all of your existing content, but how can you use your performance data to inform your content strategy going forward?

The resources that you have for content production are finite, so you need to uncover insights that will help you determine what you should be investing in more, what you should do less of, or what you shouldn’t be producing altogether.

You will likely have a good understanding of the relative performance of your content at this point, but it can be helpful to focus on specific metrics and dimensions across the whole dataset to gain deeper insights.

Finding Relationships

You can do this by pivoting variables around important metrics to find relationships that will show you how you can do more of what works and less of what doesn’t.

You’ve already defined your success metrics but here’s some examples of variables that you might view these metrics against to find interesting relationships:

  • Performance and engagement by channel/category/content type – Do some types of content perform better than others? Are they viewed or shared more frequently?
  • Content length and engagement – Is word count positively correlated with engagement or is there a drop off point of diminishing returns?
  • Content length and sharing – Does longer content, which is usually more resource intensive, get shared more than short form content? Do the results of long form content justify the larger investment?
  • Performance and engagement by author – Do some authors receive more pageviews and shares than others? This is particularly useful for media organisations where there is a high level of content production and author performance is more important.
  • Performance fluctuations by publish date and time – Is content better received on specific days of the week, time of the day or months of the year (if you have data going back far enough). Can you tailor content publication to times that are likely to get more exposure. For news sites this may mean writers publishing articles outside of standard work days and hours to get more views when their readers have more free time.

From Audits to Continuous Monitoring

Content audits are going to vary dependent on site size, type and the data you have to hand. However, the above provides a framework for conducting an audit to streamline and get the most out of existing content as well as improving your content strategy going forward.

Furthermore, you should look to achieve this with crawl data at the core, preferably with a tool that automatically integrates additional data sources to make this process as quick and painless as possible.

Once you’ve got this whole process down, you can take it to the next level by automatically pulling in the data on a regular basis and putting it into dashboards. Doing this will enable regular reviews of your content performance and will allow you to adjust the course of your content strategy accordingly.


Written By
Sam Marsden is SEO & Content Manager at DeepCrawl and writer on all things SEO. DeepCrawl is the world’s most comprehensive website crawler, providing clients with a complete overview of their websites’ technical health.
  • This field is for validation purposes and should be left unchanged.