Understanding Search Engines

What we commonly refer to as Search Engine Optimisation is something which I relate to treatment of a disease. The various treatments of SEO, from title tags and headers to information architecture and linkbuilding, are all focused on the symptoms of a disease.

Truly understanding the disease – i.e. going beyond the symptoms and superficial treatment – requires a deeper understanding of SEO. It requires that you understand how search engines work.

This aspect of SEO is, in the opinion of many including me, thoroughly neglected in our industry. Too many search engine optimisers focus exclusively on the symptoms and are entirely oblivious to the underlying causes.

Understanding how search engines work and how search engineers at these search engines tackle problems is a vital aspect of becoming a truly well-rounded SEO specialist.

Here are three methods of increasing your understanding of search engines and the people who create and refine them:

Information Retrieval (IR)

Information RetrievalEvery search engine is in essence an information retrieval system. Information retrieval can be described as “the science of finding information contained within documents and/or within metadata about documents”.

IR was around long before the world wide web was born. From the moment computer scientists started storing information in early computers, IR was needed to retrieve that information. The roots of modern day IR can be traced back to an essay written in 1945 by Vannevar Bush called As We May Think (go ahead and read it, it’s phenomenal).

All search engines are IR systems. They’re highly complex and intricate IR systems, but based on the same principle: retrieve information from a set of documents with the highest degrees of precision and recall.

While understanding the basics of IR won’t mean you’ll be able to penetrate the intricacies of Google’s algorithms, it does help with your mental picture of how search engines retrieve and rank results. And this in turn can help you distinguish fact from fiction when faced with the latest SEO hype.

Here are some starting points to get under way with learning Information Retrieval:
»» Introduction to Information Retrieval – a free ebook from Stanford that’s a great beginner’s guide to IR
»» Information Retrieval Wikipedia page – not for the faint of heart due to the abundance of sciencey lingo
»» How does Google collect and rank results? – an old PDF article from Google that serves as a low-brow introduction to IR


PatentsGoogle, Bing, and Yahoo constantly submit new patents for all kinds of aspects of their search engines. Everything from how links are recognised within an HTML document to advanced features of personalized search, patents are submitted all the time to ensure search engines can claim the rights to their latest features and improvements.

When these patents have been filed they become part of the public domain, which means we can read them as well. Staying on top of these patents is a great way to understand how search engineers think and how search engines tackle challenges such as personalization and spam-detection.

However, there are some things you need to keep in mind. First, patents are submitted for nearly every new feature and improvement the engineers at Google et al can come up with. That doesn’t mean that these patents are actually used. A patent is meant to capture the rights to a specific technology, it doesn’t mean the submitter is obliged to use it.

Second, it’s important to come to grips with the specific lingo of patents. As with any legalese discipline, patent applications come with their own language and methods of phrasing. Without a proper understanding of the patent lingo it’s all too easy to become lost in it and risk making certain assumptions about patents and their usefulness.

So it’s probably a good idea to leave patent analysis to the specialists. People like Bill Slawski and David Harry read and dissect search engine patents on a nearly daily basis, and they’re not shy about sharing their nuggets of wisdom. Follow their blogs and see what they have to say about the latest patents:
»» SEO by the Sea – Bill Slawski’s musings on patents and SEO
»» The Firehorse Trail – Dave Harry’s blog on all things SEO, with the occasional patent analysis

Research Papers

ResearchSearch engines don’t just publish patents, their engineers also publish research papers. As computer scientists, search engineers are adding to the amassed knowledge of the human species. Through research papers engineers can obtain degrees, share knowledge with other specialists in the field, and supply food for fruitful discussion within the scientific community.

Reading these research papers will help you further understand the mindset of search engineers and the goals they’re trying to accomplish with every new tweak and feature of a search engine.

Again it’s important not to read too much in to these papers, as what they describe is usually ‘old news’. But they nonetheless provide great insight in how search engines approach new challenges, which in turn will help you with your own SEO challenges.

»» SEO Higher Learning – an extensive list of advanced IR, patents and research paper resources compiled by Dave Harry, with a long list of very interesting papers that’ll keep you reading for months.

These resources should get you started on a part towards a deeper understanding of search engines, something that will come in handy many times in the course of your SEO career. If you have any other resources to share, submit them in the comments.

About Barry Adams

Barry Adams is one of the chief editors of State of Digital and is an award-winning SEO consultant delivering specialised technical SEO services to clients worldwide.

10 thoughts on “Understanding Search Engines

  1. Hi Barry, nice article. It’s an incredibly obvious point for me to make, but if we are to talking about treatment of a disease I really think that there needs to be the clinical trial element, if you know what I mean? You can read all of the secondary resources -read the theory- of how to do it, or how the solution should be applied, but if you don’t actually try out your ‘formulation’ on the subject how are you going to know the correct solution and dose?

  2. @Caroline yeah sometimes there just aren’t enough hours in the day to get everything done you want to do. 🙂

    @Andy very true, there is no substitute for hands-on experience. But a solid theoretical foundation is necessary – as it is in medicine – to be able to do the best job possible.

  3. The funny photo says a million words.

    Sometimes, I research the veracity of news. The entire process takes about one minute on Google. Amazing!

  4. Great piece Barry; though for me you’ve missed the most fundamental element to understanding any search engine – which is profit. Profit is the ultimate goal of any search engine, ever since Google switched Adsense to a CPC model and overnight became one of the most beautifully profitable business models in existence.

    Of course each team has it’s own componant set of goals and objectives, e.g. search quality, search relevance, search innovation – however each of these componants has a place on the agenda only insofar as improvements to these elements have consistently led to improvements in profit margin.

    I find that keeping the goal of profitability front and centre when considering how to optimise a website for now and the future is probably the most useful tip I can give.

  5. @Nichola, good point and indeed an omission in my post. We all would do well to keep in mind that Google and other search engines exist to make profit, and that whatever they do they do it to maximise profits. In Google’s case, with 99% of their profits coming from their advertising platforms, this means getting more people using Google products and keeping them on Google SERPs as long as possible.

  6. very interesting. . . the top indicator shows the article was written 2 years ago. . .

    the comments below record their submissions as 3 years ago. . .

    this internet thingy is amazing … !
    you were getting feedback on your article a year before you wrote it. . .

    simply amazing. . . (LOL)

Comments are closed.