Panda in detail

This is a Guest post by Peter van der Graaf. Peter is a big fan of behavioral psychology and mathematics. He mainly helps his clients with their internal SEO evangelism, link building strategies and international SEO effort. The scale in which he and his partners perform search algorithm tests has the potential to give great insights.

I was surprised to find out how much is written about the Google Panda update and how little information is shared so far about what is really happening.

Machine learning

Google’s algorithm on the characteristics of unnatural pages is periodically updated by a machine learning background job. This means it is not a live algorithm! The much reported Panda versions 1.0 to 2.5 are algorithm changes which are first calculated on a training dataset and combined with the existing learnings they are exported to the live Google environment as more static algorithm tests.

This means that while bounce-rate (in this case: visitors returning to search results quickly) isn’t used as a direct ranking factor, it is used to teach the Panda new tricks. Signals like bounce rate are fed as bamboo to the Panda background system with the instruction to find out what patterns can be derived from characteristics that form thin content, unnatural text and excessive on-page advertising. The system picks various combinations of attributes combined to get a high degree of certainty for someone’s spammy activities.

For those familiar with “distributed tree learning”, look up the works of Google engineer Biswanath Panda. After whom the Panda update was named. He will explain how continuously splitting sites into groups with similar attribute values helps you afterwards derive which attributes effected a certain outcome (like high bounce-rate) the most. It also gives some indication of the thresholds to be used and it can signal when false positives or negatives are likely to occur.

If Panda will ever become a live (continuously updated) algorithm remains to be seen. It can even be that the derived tests become so effective, that no further updates are required.

Steep or sloping threshold?

Because Panda consists of large combinations of factors it seems to be more certain of its outcome. While existing algorithms for unnatural behavior used a sloping threshold in which the increasing evidence pushed you gradually towards lower ranking, Panda currently uses a more thorough approach.

Gradually increasing the degree of unnatural text maintained existing ranking for quite some time, but eventually resulted in a steep drop in ranking for all tested websites. Individual elements within the algorithm for thin content are hard to reverse engineer, but once you cross a certain point you are sure to be hit. Because signals are inspected in combinations that include link value attributes, not every site has the same threshold.

You might even argue that Panda has replaced a previous algorithm that had a sloping threshold, because many sites with thin content below the Panda threshold have returned in top-10 positions.

Domain, section or page based effect?

Panda affects large amounts of pages within the same domain. It doesn’t target long-tail keywords, but pages with these keywords tend to be in sections with many pages that have low quality content.

Sections of pages can be grouped by many factors like block element buildup. Once a threshold within these pages is reached, all pages in the section are affected, including ones with a slightly higher quality.

Once you have been hit, recovering requires more effort than just increasing quality below the threshold again. Changing domain however (including 301-redirect) seems to return your ranking if you barely stay below it. Just changing URLs within the same domain doesn’t seem to have this effect.

Solution against Panda plagues

Sites with large amounts of pages below a quality threshold are targeted by Panda. When you use sentences in which you only replace a couple of keywords compared to other pages; If you have a lot of content from other websites; If you make a lot of spelling or gramatical errors; And when you have excessive ads on your page be prepared for Panda claws. Assuring quality for all pages might be hard, but make sure you do this for all pages that are important for your visitors and for Google. All pages below a logical quality should be removed or excluded from the Google index (canonical tag/noindex/etc).

Pages with sentences like “no results found for [keyword]” are often crawlable by Google. Misconceptions of malintent like this should also be taken into account.

If that doesn’t work, you can always build a Panda trap.

Hopefully this article has clarified some misconceptions. Note that this is the consensus of many search experts and represents the supposed current situation. If there is any proof to refute this article, please comment. We’re all more than willing to learn.

About State of Digital Guest Contributor

This post was written by an author who is not a regular contributor to State of Digital. See all the other regular State of Digital authors here. Opinions expressed in the article are those of the contributor and not necessarily those of State of Digital.

9 thoughts on “Panda in detail

  1. Great stuff Peter, thanks for sharing your insights. Probably the most accurate description of what Panda really is that’s been published on the interwebz so far.

  2. So, the supposition is that on and offpage (that is to say inlinks) links and link based metrics play no role in Panda? Is it just correlation then that many of the sites pandalized also seem to have spammy backlinks?

    twitter: @joshbachynski

    1. Signals found by the Panda background system definately include link profile signatures. I tried affecting various different websites by increasing unnatural content at a steady pase. All sites that were affected only after an enormous amount of spam had one thing in common compared to ones that were effected early on: They had a much more authoritive incoming link profile.

      I wouldn’t say that it proves anything, but “links have an effect on the threshold” is definately a conclusion you could derive from various tests. Anyone with similar or contradicting evidence?

  3. great post. Biswanath’s paper should be required reading for any SEO trying to understand the Panda algo. I think it also highlights how easy it is for ‘innocent’ sites to get nailed by the algo and how the recent change to ‘sessions’ will bring about the panda penalty.

  4. Panda has gotten more sophisticated and they’ve added Penguin to it. Now it’s extremely harsh for the newcomers especially. Breaking into the search engine market is harder than ever.

  5. Bernitez tetap kalem dengan keberhasilan

    terbaru dan terkini Bola Soccer dari Agen Bola Luckygol – Keberhasilan
    Napoli meraih kemenangan ketiga di Seri A membuat mereka memuncaki klasemen
    sementara. Pelatih Rafael Benitez enggan berlebihan menanggapi hal ini.

    Agen Bola Luckygol Terpercaya – Napoli
    meraih poin penuh pada pertandingan kontra Atalanta di San Paolo, Minggu (15/9/2013)
    dinihari WIB. Partenopei menang 2-0 berkat gol Gonzalo Higuain dan Jose
    Callejon. Tambahan tiga poin menempatkan Napoli di puncak klasemen sementara
    dengan koleksi sembilan poin dari tiga partai. Mereka unggul dua poin atas
    Inter Milan dan Juventus.

    “Saya sangat puas karena kami menang dan juga merotasi skuat untuk
    menyimpan energi. Kunci untuk tim ini adalah kepaduan tim. Fans harus selalu
    dekat dengan kami karena dukungan mereka sangat penting. Kami ada di puncak
    klasemen, tapi kami harus tetap tenang. Masih terlalu dini untuk menilai,”
    nilai Benitez.


Comments are closed.