Understanding Search Engines
Estimated reading time: 4 minutes, 33 seconds
What we commonly refer to as Search Engine Optimisation is something which I relate to treatment of a disease. The various treatments of SEO, from title tags and headers to information architecture and linkbuilding, are all focused on the symptoms of a disease.
Truly understanding the disease – i.e. going beyond the symptoms and superficial treatment – requires a deeper understanding of SEO. It requires that you understand how search engines work.
This aspect of SEO is, in the opinion of many including me, thoroughly neglected in our industry. Too many search engine optimisers focus exclusively on the symptoms and are entirely oblivious to the underlying causes.
Understanding how search engines work and how search engineers at these search engines tackle problems is a vital aspect of becoming a truly well-rounded SEO specialist.
Here are three methods of increasing your understanding of search engines and the people who create and refine them:
Information Retrieval (IR)
Every search engine is in essence an information retrieval system. Information retrieval can be described as “the science of finding information contained within documents and/or within metadata about documents”.
IR was around long before the world wide web was born. From the moment computer scientists started storing information in early computers, IR was needed to retrieve that information. The roots of modern day IR can be traced back to an essay written in 1945 by Vannevar Bush called As We May Think (go ahead and read it, it’s phenomenal).
All search engines are IR systems. They’re highly complex and intricate IR systems, but based on the same principle: retrieve information from a set of documents with the highest degrees of precision and recall.
While understanding the basics of IR won’t mean you’ll be able to penetrate the intricacies of Google’s algorithms, it does help with your mental picture of how search engines retrieve and rank results. And this in turn can help you distinguish fact from fiction when faced with the latest SEO hype.
Here are some starting points to get under way with learning Information Retrieval:
»» Introduction to Information Retrieval – a free ebook from Stanford that’s a great beginner’s guide to IR
»» Information Retrieval Wikipedia page – not for the faint of heart due to the abundance of sciencey lingo
»» How does Google collect and rank results? – an old PDF article from Google that serves as a low-brow introduction to IR
Google, Bing, and Yahoo constantly submit new patents for all kinds of aspects of their search engines. Everything from how links are recognised within an HTML document to advanced features of personalized search, patents are submitted all the time to ensure search engines can claim the rights to their latest features and improvements.
When these patents have been filed they become part of the public domain, which means we can read them as well. Staying on top of these patents is a great way to understand how search engineers think and how search engines tackle challenges such as personalization and spam-detection.
However, there are some things you need to keep in mind. First, patents are submitted for nearly every new feature and improvement the engineers at Google et al can come up with. That doesn’t mean that these patents are actually used. A patent is meant to capture the rights to a specific technology, it doesn’t mean the submitter is obliged to use it.
Second, it’s important to come to grips with the specific lingo of patents. As with any legalese discipline, patent applications come with their own language and methods of phrasing. Without a proper understanding of the patent lingo it’s all too easy to become lost in it and risk making certain assumptions about patents and their usefulness.
So it’s probably a good idea to leave patent analysis to the specialists. People like Bill Slawski and David Harry read and dissect search engine patents on a nearly daily basis, and they’re not shy about sharing their nuggets of wisdom. Follow their blogs and see what they have to say about the latest patents:
»» SEO by the Sea – Bill Slawski’s musings on patents and SEO
»» The Firehorse Trail – Dave Harry’s blog on all things SEO, with the occasional patent analysis
Search engines don’t just publish patents, their engineers also publish research papers. As computer scientists, search engineers are adding to the amassed knowledge of the human species. Through research papers engineers can obtain degrees, share knowledge with other specialists in the field, and supply food for fruitful discussion within the scientific community.
Reading these research papers will help you further understand the mindset of search engineers and the goals they’re trying to accomplish with every new tweak and feature of a search engine.
Again it’s important not to read too much in to these papers, as what they describe is usually ‘old news’. But they nonetheless provide great insight in how search engines approach new challenges, which in turn will help you with your own SEO challenges.
»» SEO Higher Learning – an extensive list of advanced IR, patents and research paper resources compiled by Dave Harry, with a long list of very interesting papers that’ll keep you reading for months.
These resources should get you started on a part towards a deeper understanding of search engines, something that will come in handy many times in the course of your SEO career. If you have any other resources to share, submit them in the comments.