And that is the wrong question to ask.
Before I can explain why, it’s important that we first have a basic understanding of how search engines actually work. I’ve written about my ‘Three Pillars of SEO‘ approach before, which is based on a simplified model of web search engines like Google. I’ll summarise the main points here:
Three Search Engine Processes
In a nutshell, most information retrieval systems have three main processes:
- Query Engine
The crawler is all about discovery. At heart, its purpose is straightforward: find all URLs and crawl them. It is actually a pretty complicated system, with subprocesses involved with (to name but a few) seed sets, crawl queuing and scheduling, URL importance, and monitoring server response time.
The crawler also has a parsing module which looks at the HTML source code of what is being crawled and extracts any links that it finds. The parser does not render pages, it just analyses the source code and extracts any URLs found in <a href=”…”> snippets.
When the crawler sees URLs that are new or changed since its last visit, it sends them to the indexer. The indexer then tries to make sense of the URL, analysing its content and relevancy. Here we also have a lot of subprocesses looking at things like page layout, canonicalisation, and evaluating the link graph to determine a URL’s PageRank (because, yes, Google still uses that metric internally to determine a URL’s importance).
Googlebot or Caffeine?
The confusion all starts when people – be they SEOs, developers, or even Googlers themselves – say ‘Googlebot’ (the crawler) but actually mean ‘Caffeine’ (the indexer). This confusion is entirely understandable, because the nomenclature is used interchangeably even in Google’s own documentation:
Right. But the contradictory text in the WRS documentation remains, so it’s entirely forgivable for SEOs to confuse the two processes and just call it all ‘Googlebot’. That happens all the time, by even the most experienced and knowledgeable SEOs in the industry.
And that’s a problem.
Crawling, Indexing, and Ranking
Yes, actually. We do need to know that.
Which, to its credit, Google will actually do. But that takes time, and a lot of interplay between the crawler and indexer.
And, as we know, Google does not have infinite patience. The concept of ‘crawl budget’ – an amalgamation of different concepts around crawl prioritisation and URL importance (Dawn Anderson is an expert on this) – tells us that Google will not try endlessly to crawl all your site’s pages. We have to help a bit and ensure that the pages we want to be crawled and indexed are easily found and properly canonicalised.
And because pages are crawled and rendered according to their perceived importance, you could actually see Google spending a lot of time crawling and rendering the wrong pages and spending very little time on the pages you actually want to rank.
Good SEO is Efficiency
Over the years I’ve learnt that good SEO is, in large part, about making search engines’ lives easier. When we make our content easy to discover, easy to digest, and easy to evaluate, we are rewarded with better rankings in SERPs.