How search engines might define quality

21 September 2010 BY

There are a lot factors playing a role in SEO. From a single line of code in your website to major changes in the way you run your business. One of the most important factors however is ‘high quality content’. Often referred to as linkworthy content, fresh content or whatever other term you can make up. But how would you define quality content? It’s not easy to describe. For search engines it’s even harder. Search engines can’t tell if certain content is high or low quality by reading it. They have to use signals to identify the quality of content. In this article I will explain the signals search engines use to define quality.

First of all you have to keep in mind that search engines score documents as a whole. They can’t score the quality of parts of documents. Documents are the results shown in the search engine so those have to be scored.

Domain level signals

The domain on which a document is placed gives a lot of information about the overall quality on that domain. Search engines use different kinds of signals from the domain for determining the quality of a specific document.

Domain age: The domain age tells search engines simply how long a domain exists. Search engines argue that valuable (legitimate) domains are often paid for several years in advance where illegitimate domains rarely are used for more than a year.

Existence of domain in directories: Some directories where submissions are edited by hand might still have a positive impact on the quality of a site as seen by search engines.

Prior site ranking: If you’re domain has had a prior ranking on a specific keyword search engines might consider the domain a high quality source for topics considering that keyword.

TrustRank and PageRank: The TrustRank of a domain tells search engines the overall trust it has in a website. The PageRank is a signal for linkworthiness of the complete domain. Both these factors are a signal for the quality and trust of each document on that domain.

Document level signals

The document itself holds the most information about the quality of the document. Search engines use different kinds of signals on-page and off-page.

TrustRank: The TrustRank is a number that indicates the trust that search engines have that the document is not spam. “TrustRank is a method for separating reputable, good pages on the Web from web spam”. TrustRank is based on the distance (in links) of the document from hand-selected trusted sites. The closer the document is to trusted sites the higher the probability that it’s a trustworthy site itself. Trustworthiness can be a signal for quality.

Number of inbound links: The number of inbound links may be a signal for the quality of information in a specific document. The more people link to a document, the more likely it is it has high quality information.

Link profile of inbound links: Although the number of links might be an indication of the quality of the information, the types of links might matter even more. There are a few factors to take into account here. The trust of the source is one factor. This can be based on the TrustRank or PageRank of the page/domain. But here are also other signals like domain extension. .edu and .gov domains tend to have a higher trust than other domain extensions.

Links in the body text of a page appear to be of more relevance than footer links, therefore the sources have to be me more trusted (otherwise you wouldn’t link from the body and increase the chance of leading your visitors to that page).

Another factor considering the link profile could be how ‘natural’ the link profile is. Here you can think of variance in anchor texts, the use of site-wide links, same-IP links etc.

Uniqueness of the document: Mainly copied content doesn’t have any added value for users. A document must have a certain amount of unique content to get classified as high quality at all.

Staleness of document: The freshness or staleness of a document could influence the percepted quality of the document. For some kinds of topics older information could be outdated while recent information covers the topic much better. The freshness is not only determined by the age of the document (when it was crawled for the first time), but also by freshness of links, the latest changes to the document, growth of links to a document and changes in anchor texts. More on freshness of documents and changing content on SEO by the Sea.

Outbound linking: Linking to quality sites from your document might not help you in ranking that much, but linking to ‘bad neighborhoods’ definitely will harm your rankings. Yahoo! Says “Hyperlinks intended to help people find interesting, related content, when applicable.”

Over optimization/keyword stuffing: Search engines are on some level able to recognize ‘natural texts’. However, over optimization and keyword stuffing they can recognize in a second. And it definitely degrades the percepted quality for a document. All search engines claim they value content created primarily for users and secondary for search engines.

Load time: Although it’s a little bit of a long shot, the load time also can be a quality signal for search engines. Because a faster loading document creates a better user experience the overall valuation of the quality of the document might be higher simply because the user experience is better. It doesn’t mean the information on the page is better, but in the perception of the user it might be better.

User behavior signals

User behavior could be a strong indicator of the quality of a document. Search engines have filed multiple patents supporting this theory. These include factors like CTR in the SERPs, bounce rates, time on page, overall traffic to a page, bookmarking etc. Despite the possible strength of these signals they haven’t been used by search engines that much. Most important reasons for this are the noisiness of the signals and that these signals are easily spammable.

There are probably even more factors search engines consider in scoring the quality of a document, could you add any?


Jeroen van Eck is a consultant search engine marketing at the online marketing company E-Focus in the Netherlands.
  • Yodeho

    There are 2 Trust Ranks. You quote from the Yahoo paper “Combating Web Spam with TrustRank”. But there is also a Google Trust Rank Patent. This one is is different from the TrustRank developed by the writers of the Stanford/Yahoo paper.

    Google Trust Rank: ‘A search engine system provides search results that are ranked according to a measure of the trust associated with entities that have provided labels for the documents in the search results. A search engine receives a query and selects documents relevant to the query. The search engine also determines labels associated with selected documents, and the trust ranks of the entities that provided the labels. The trust ranks are used to determine trust factors for the respective documents. The trust factors are used to adjust information retrieval scores of the documents. The search results are then ranked based on the adjusted information retrieval scores. ‘

  • Dom Hodgson

    It’s a great article and good starting point for everything but I think a little too much is assumed without evidence;

    Search engines argue that valuable (legitimate) domains are often paid for several years in advance where illegitimate domains rarely are used for more than a year.

    Buying a domain name for 10 years costs less than $100, if your going to be spamming and burning the domain, in that case $100 isn’t a big investment so that point needs to be pushed a bit more

    I’m not out to trash the article as I said, its a good starting points but if your going to make a point, you need to back it up with some evidence and/or testing.

    The links to SEO by the Sea help your argument but everything in this article has been debated hundreds of times previously, link to a few resources and start to quote them (also helps pad out your article ;) )

    [This is why I don't write about search :) ]

  • Jeroen van Eck

    @Dom Thanks for your response.
    To clarify: This article was meant as a starting point. I’m not trying to tell how much influence every factor has, but which factors search engines MIGHT use. I’m trying to make the abstract term ‘quality content’ a little bit easier to grasp.
    Every factor is debatable and has probably been debated. Refering to each debate is a little too much I think. I think if you want to know more about a specific factor you can just Google it ;) or ask it here.

  • Vic

    With the latest updates made by Google today, we also need to adjust our techniques in building links. But creating quality content that provides solutions to people will always be one of the most important tactics to drive trafic.

