10 Ways To Save Time and Identify Major Site Errors with DeepCrawl

I first met the DeepCrawl team at Brighton SEO where they were promoting their web tool to conference attendees. DeepCrawl has a relatively new website tool which I have been using. I was lucky to be given a demonstration of the tool. I cannot cover every aspect of DeepCrawl in this post, but I would like to highlight the areas I found particularly useful.

1) Easy to Use

The DeepCrawl tool allows you to easily crawl a site. The first step is to Add Project as the below shows.

Deep Crawl Project

DeepCrawl can crawl sites with a up to 5 million URLs in total. The crawl is also customisable, so users area able to select the speed of the crawl between 1 and 20 URLS per second (this in the advanced settings).

CrawlRate

It is also possible to select exactly when you crawl your own site. Many people choose to crawl overnight or over the weekends as it is faster than during peak hours.

Setting up new project

The tool allows the user to include or exclude certain pages such as adhering to the no follow links.

In my case, I wanted DeepCrawl to crawl through my entire site and also check the Sitemaps and organic landing pages, and therefore ticked “Universal crawl”. I clicked ‘save’ and moved to the next section where it confirmed the analytics. It picked up the UA ID and then I pressed save to begin the crawl of the site.

2) Simplification

The tool simplifies what may seem complicated to many, especially those not familiar with the technical aspects of SEO. The overview report highlights many aspects of the site that need to be addressed from a top level.

Overview of Deep Crawl

The overview report also shows the number of crawls you have processed on your site. It shows the number of unique pages and the depth, which is important especially in large sites.

3) Identify Indexation

The report clearly shows the number of URLs that have been crawled as well as the number of unique pages on a site and duplicate pages. What I like about the tool is that it also shows the canonicalized pages and the no-followed pages. I find it particularly useful that the tool clearly shows the errors of the site.

Indexation

One element that is a clear USP, is the fact that DeepCrawl highlights the changes from one crawl to the next which makes it very easy for people to see what has changed and what has not. From the above example the cells in green are from one crawl and the cells in red are from the second crawl of the site.

After users have run more than one crawl they have the trend the bottom of the dashboard on the Overview tab.

webcrawl depth
 

 

 

 

 

 

 

4) Identify Content

The Deep Crawl tool clearly shows the meta titles and descriptions on the content tab. It also tells the user if the meta data on the site is over the recommended length. The content tab also shows duplicate body content, as well as if there are missing H1 tags and multiple H1s on the page. The report also identifies if there are valid Twitter cards and open graphs, the latter is something I have not seen before in a crawling tool.

Content Overview

5) Clearly see internal broken links

My site was hacked into twice last year. Since then, I have done a lot of work to try and resolve this. I had to a do a complete reinstallation of the site, which meant many URLS went from sitename/date/post-name to sitename/uncategorozed/post-name.

Internal Broken Links

I knew I had internal broken links and therefore have been going through my site slowly to resolve them. This tool has helped to identify the internal broken links, which I will be addressing when I go through my posts. All internal links, external links as well as redirected links are highlighted in the validation page.

6) Assigning tasks to others

This is the aspect of the tool I really like and is crucial in project management. There may be several areas of the site, which the tech team identify and should be amended. However, due to limited resources (time and money), rectifying these errors may not always be possible. Therefore it is best to identify the tasks that can be actioned with a realistic date and assign these tasks to the dedicated personnel. The issues can then be seen in the projects section. It is possible to export the tasks and discuss them with your clients.

Task Reporting

7) Page Level Detail

I found a few duplicate page titles on my blog, this was mostly due to the pagination issue with the site. (eg. page-2/page-3). With larger commerce sites, the page level detail is a useful aspect of the tool as it is easy to see the errors of the page on a detailed level.

Below is a screengrab of the page level detail. The DeepRank level is out of 10. The DeepRank score is a measure of authority based on number of links in and out of that page as well as number of clicks from the home page. And when you combine that with GA data such as site visits, you get an even better idea of which pages you should prioritise fixing because they have a lot of authority from search engines and are greatly accessed by your users.

A score of 10 for DeepRank is the most serious, with this page being a 3 out of 10. The tick marks show the page is indexable and it is a unique page.

Duplicate Page Titles - Detail

 

8) Schedule reports

The ability to schedule reports is very useful, especially if you have a busy work calendar and may forget without reminders. The report will be emailed to you once it is complete. It is important to have reports so you can monitor the progress and changes made to the site. Once you review a decrease of duplicate page titles or any other issue marked at the beginning of the project, you can then monitor the progress made. This is particularly important if your client is asking to see the ROI of SEO.

9) Integration with analytics

When setting up a project it is possible to integrate the tool with your own analytics at the click of a button. This means no more exporting your own data and trying to match it with the crawler data like broken pages.  This makes our job in SEO that much easier.

How does DeepCrawl do this?

It crawls the website’s architecture level by level from the root domain. Then it compares the pages discovered in the architecture to the URLs you are submitting in your sitemaps. Finally, DeepCrawl compares all of these URLs to the Organic Landing Pages in your Google Analytics account.

This is another great USP of DeepCrawl as this feature allows users to find some of the gaps in their site such as:

  1. Sitemaps URLs which aren’t linked internally
  2. Linked URLs which aren’t in your Sitemaps
  3. URLs which generate entry visits but aren’t linked, sometimes referred to as orphaned pages or ghost URLs
  4. Linked URLs or URLs in Sitemaps which don’t generate traffic – perhaps they can be disallowed or deleted

By integrating DeepCrawl with your analytics, it can give you an indication as to how important the pages are based on site visits, bounce rate, time on site etc. and therefore which you should probably fix first to have maximum impact.

DC Google Analytics

10) Crawls Before the Site is Live

You may have had a client site where you want to run a crawl through the site before it goes live, but cannot as it is behind a secure wall. Fortunately, Deep Crawl is here to help. Deep Crawl allows you to crawl the site behind the secure wall and it will run a report highlighting any errors. This is particularly important because you compare the test site to the live site to see what the difference is and if there is anywhere you may lose traffic if you pushed the site live as it was.  This means you can easily identify any errors of the site before the site goes live, saving you hours of time and making you look good in front of the client – another bonus !

Conclusion

Deep Crawl is a great tool. It clearly shows any issues with your site and makes the complicated and “techie” aspects of the site easy for anyone to understand. If you have difficulty explaining the more technical aspects of a site to the rest of your team, this tool will save you time and simplify the issues making it easy for your colleagues to understand. At just £50 a month to crawl 100,000 URLs, this is certainly a good deal.

Know about what is changing in marketing!

Keep up with the latest digital marketing developments, views and how-tos through State of Digital’s digest newsletters. Be the first to hear about events, white papers, e-books, webinars, training and more!


About Jo Turnbull

Jo Turnbull is the organiser of Search London and the founder of SEO Jo Blogs, which provides practical advice and tips for those in SEO.

  • Graham O’Shea

    Hi Jo

    Great article. We also have a enterprise crawler solution at seopler.com

    Our solution is cloud based and offer features such as bulk W3C validation check.

    Please contact me if you would like an unlimited demonstration account.

    Regards

    Graham