Crawling a website is an extremely valuable part of any SEO’s armoury, and it’s great that we have so many tools available to make the job easier.
Tools such as DeepCrawl, OnPage.org and Screaming Frog are constantly innovating to bring us more advanced data, allowing us to make informed, data-led decisions. Due to these innovations and constant changes, there are many hidden gems that are underused and could save you lots of time.
One such feature is the custom search and extraction aspect of the Screaming Frog SEO Spider. Over the past few months, I have been using this feature on a more frequent basis and wanted to share with you some of those key points.
Before we get into that, what exactly is custom search and extraction?
This can be found by navigating to Configuration > Custom > Search
The custom search’s main function is to find anything specific that you want in the source code. You simply need to enter what you would like to search for, ensure that you have selected Contains, select OK and then run the program.
The program will go ahead and run, and pull out any pages that contain your specific input value. The great thing here is that you can also use Regex as part of your search query.
This can be found by navigating to Configuration > Custom > Extraction
This report allows you to collect any data from the HTML source code of a URL that has been crawled by the tool. For this to work, the static page has to return a 200 status code.
Currently, you are able to have ten different extractions from the HTML source code at any one time. You can name these extractions as you see fit to ensure that they fit the requirements of your specific crawl.
The program currently supports the following methods to extract data:
- XPath: XPath selectors, including attributes.
- CSS Path: CSS Path and optional attribute.
If you have selected either XPath or CSS Path to collect the required data, you have the option to choose what to extract:
- Extract HTML Element: The selected element and its inner HTML content.
- Extract Inner HTML: The inner HTML content of the selected element. If the selected element contains other HTML elements, they will be included.
- Extract Text: The text content of the selected element and the text content of any sub-elements.
So, that is a bit of information about what Custom Search and Extraction is, so let’s now jump into what you can do with it.
Below I have listed four ways that I have been using it over the past few months. I am sure there are many more ways and I would love to hear about them in the comments below.
1. Checking GA & GTM implementation
Let’s start easy! There is the regular requirement to ensure that all your tracking is implemented and remains in place on a regular basis.
Using custom search you can check to see if the Google Analytics or Google Tag Manager code is implemented and, if not, what pages are they’re not on.
To do this you need to open up the custom search feature and add in the specific code that you are looking for as shown below.
You have up to ten different filters that you can add so feel free to add more than the one code if you have more. Once you are done adding in the code, hit OK and run the program. If you head over to the custom tab within Screaming Frog you should start to see this being populated if the code does not exist on a certain page.
Useful right? I tend to do this on a regular basis to ensure that I am on top of any tracking issues.
2. Finding rogue canonical tags
When moving websites you can sometimes come across old canonicals that have not been updated or removed from the new URL. This can cause indexing issues with the search engines and need to be updated quickly.
Using the custom search feature you can identify where these are and send them to your developer in order for them to be changed.
This is one of those checks that can be done very quickly and save a huge amount of time.
3. Finding product information
If you have an eCommerce website you may want to regularly check for certain details. In recent times I have been searching for the following:
- SKU numbers
- Product Details
- Pricing information
- Product Spec
To do this, head over to the custom extraction filter as specified earlier. Using XPath as the extraction method, head over to a product page and inspect the element of each of the pieces of information you want to return as shown below.
In the example above, I right clicked the <td> then hovered over Copy and selected Copy XPath. I then returned to Screaming Frog and pasted it into the correct extraction location as shown below. I continued to do this for the other sections I wanted to gather information for.
Once I was happy with the extraction information, I clicked OK and ran Screaming Frog across the website in question. Whilst running, I headed over to the Custom filter and selected Extraction from the dropdown. Below is an example of the data I was getting back from my crawl.
Once complete and exported into excel, I was able to filter and pivot the information to allow me to use it. There are many instances that you may want to use this type of extraction for, but below are just a few:
- Allows you to create a product matrix with all the pertinent information in
- Crawling competitors to identify product price variation and how they are describing their products
- Rebuilding websites and understanding what information is currently being shown to the user
- Checking that product pages have Schema implemented
4. Finding blog post authors
Want to find out which of your bloggers are getting the most shares and links for content they write? Using the extraction method, you can see what content is being written by each team member.
Once complete, export this information to Excel, and can combine with BuzzSumo and aHrefs exports. Using vLookups, you will be able to identify which authors are generating the most links and/or shares. You can take this further by adding in other information that is available on your blog, such as categories.
As mentioned earlier, I am sure there are many more ways that you can use these features to get really useful information, and I am hoping that you will add to the above in the comments below. If you have any questions, feel free to tweet me @danielbianchini.