I’m not kidding about this – I recently came across a website that was blocking a user agent from a very well known search engine. No one at the company had any idea of the situation, and this particular situation had been costing them a ton of traffic over weeks and weeks.
Time to get the honesty box out: When was the last time you switched user agents? Checked a 304 not modified response? Made sure your canonical www redirect was working correctly? Some things are so easily missed in todays “out of the box” code world. Here are 5 quick checks that are so easily missed, but can save hours of head scratching!
Check your canonical redirects and domain inventory
Ok, if you’re a seasoned old timer, there’s nothing new here – but, be honest! When was the last time you checked your canonical redirects? Does your “www” redirect in, or out (depending on which you prefer) with a 301 server header response? Mine does – but I just checked SEOgadget’s for the first time in 6 months. This same tip applies to case redirects, trailing slashes and even your redirected domain inventory. Remember, web server configurations can change, often without the SEO being made aware.
Periodically browse the internet with a different user agent
In the example at the beginning of this post I mentioned the problem with a single search engine bots user agent being restricted from crawling a site. I can’t remember the last time this problem cropped up, it’s so rare! Browsing the internet with your user agent set to say, MSNbot (or Bingbot from October 2010) can reveal some fascinating oversights, errors or dare I say, cloaking. SEOmoz’s toolbar or User Agent Switcher both offer the capability to switch user agents in Firefox.
Beyond 404’s – server header checks that get missed
Beyond checking that your error pages produce a 404 (and that Google Webmaster Tools isn’t reporting too many), you might want to consider digging in to your server header responses a little deeper. For example, a “304 not modified” is a response to an if-modified-since header field in the client request header. In English: some webservers will respond with a “not modified” if the page requested hasn’t changed since the last time it was crawled. I’ve seen 304 responses handled really badly. In one situation, a web site was responding normally to all requests except when the if-modified-since header field was present. The server, instead of returning the correct 304 response, collapsed spectacularly with a 403 error. Oops! Test your site with Feed the Bot’s awesome 304 header checker tool (one of my favourite SEO tools, ever). Are 304 responses worth worrying about? Yes, if you have a large site. Bing and Google requests support if-modified, and check out this data on crawl coverage for pages with and without the conditional reponse active:
While we’re on the subject of server headers
Ever look out for the X-Robots tag? X-Robots is part of robots exclusion protocol (REP) and can be found in the server header response of a web page. You can noarchive, noindex, nofollow with an X-Robots tag, so it’s probably worth checking to see if something unexpected is lurking. You could even try checking for X-Robots with (and without) your user agent configured as a search engine… What are your oft-overlooked but seriously handy search engine accessibility checks?
Featured Image credit: Arthur Chapman
About the Author, Richard Baxter
Richard Baxter (@richardbaxter) owns SEOgadget.co.uk – UK SEO Consultants helping people and organisations succeed in search. Richard has accrued valuable experience throughout his career in travel, engineering, recruitment, technology startup, retail and events industry SEO.