Clicky

X

Subscribe to our newsletter

Get the State of Digital Newsletter
Join an elite group of marketers receiving the best content in their mailbox
* = required field
Daily Updates

Solving Duplicate Content Issues

25 January 2011 BY

Yesterday I read an extremely useful and practical walk-thru on Page Level Search Engine Indexation, by Richard Baxter at SEO Gadget. In the post Baxter looks at a variety of data sources from the simplest, such as Webmaster Tools, which will tell you how many URLs are in your sitemap and how many of those are indexed – (the difference therefore being of interest;) to more robust methods of actually identifying which of the total site pages are errant, in this case by building a Mozenda agent scraper. Definitely worth a read for anyone working on large websites (particularly ecommerce sites) with multiple instances of similar page- type e.g. product- level.

Setting aside issues of information architecture, internal and external linking; one of the most common issues we see at theMediaFlow when presented with sites that have page-level indexation issues, is one of duplicate content. Cross domain duplicate content carries related and additional challenges, however in this case I wanted to share some experiences of solving page indexation issues when duplicated content is present on product level pages.

First-off, duplicate content doesn’t necessarily mean that you have an exact replica of a page occurring again on the site. More that the page is found to be so substantially similar to another that it is not indexed. Essentially, the page is found to be adding nothing of variety or additional value when compared to another page on the site. The natural and simple response would be to vary the page content by hand, however this is a problem most common with large ecommerce websites where on occasion such a solution is not practical. As an example if a website sells screws, and happens to have fifty different size of screws, available in two different types of material for seven different applications (from exterior wood to interior masonry) then that’s 700 pages that could potentially be considered inherently the same.*

*That is imagining the site has been built to display one finite product type, per page – which actually wouldn’t be best-practise from an information architecture perspective; however in this example we’re looking at how to solve problems after-the-fact.

If you have a large site with page-level indexation issues, which seem to be attributable to substantially similar page-level content, here’s a few points to consider:

1. Lead with the point of difference

Ensure your point of difference is “front-loaded” in meta-title and on-page title tags. E.g. “20 mm brass exterior-wood screw” as opposed to “exterior wood screws – 20mm brass”.

2. Remove boiler-plated text

“Boiler-plate” text is a passage or paragraph of standardised text, such as the “About the Company” bit at the footer of a press release. You may often find boiler-plated text on ecommerce sites, on all pages where the product includes a common feature. E.g. If all the pages about brass screws on our example site have a paragraph of exactly the same text about the quality of the brass, and the ratio of copper to zinc used in the alloy.

Instead, consider creating a page or new section on the site “our materials”, “our quality materials” and housing such content there and placing a link to the appropriate reference to materials at the product page footer, as opposed to a wholesale duplicated paragraph.

3. Canonize a product page

To canonize a page, i.e. to select a preferred page out of a group of very similar pages and implement a canonical tag, is a solution explicitly evolved out of the issues posed by duplicate (or substantially similar) content.

  • Group your substantially similar pages (e.g. all exterior wood screws in brass, from smallest to largest size)
  • Identify a suitable page to be canonized (it would make sense to consider which page may already be the strongest by looking at internal and external links to these pages and consider also which product may sell best.)
  • Make note of the URL of the canonical page e.g http://example.com/screws/30mm-brass-exterior-wood.htm
  • Implement the rel=canonical tag on all the substantially similar pages in the group, in the HTML header

<link rel=”canonical” href=”http://example.com/screws/30mm-brass-exterior-wood.htm”/>

If you’re unfamiliar with the canonical tag, there are a number of pitfalls and often serious implications of poor implementation. I would suggest you begin with a thorough read of this excellent post by Lindsay at SEOmoz, which is an extremely thorough guide to correct implementation of rel canonical.

4. Handwrite a bespoke original sentence per page

Add a sentence or two of completely bespoke text as a product description or bulleted feature line, on each product page. On occasion you may find that this solution may be all that is required, however if you have an extremely large number of substantially similar pages, or if this solution isn’t practical for whatever reason, then i would strongly recommend you at least add some unique text to your canonized page.

Finally; whilst this post looks at ways to solve existing duplicate content issues, as an explicit cause of incomplete indexation; it must be noted that such issues can often be avoided completely with good information architecture at the outset.

AUTHORED BY:
h

Nichola Stott is owner and co-founder of theMediaFlow; online revenue optimisation and audience services (including SEO, SEM and SMM). Prior to founding theMediaFlow, Nichola spent four years at Yahoo! as head of UK commercial search partners.
  • http://twitter.com/#!/kevinjgallagher @kevinjgallagher

    what about detecting duplicate content. I know there is copyscape but any other tools you would recommend?

    • http://socialr.com.au/ Glenn Comanda

      If you have Moz Pro account, it has a Duplicate Content Errors report and will list all of your pages similar to your other pages.

  • Pingback: Tweets that mention Solving Duplicate Content Issues with Ecommerce Pages - State of Search -- Topsy.com

  • http://www.mediarunsearch.co.uk/blog Paul North

    Spot on advice. I’d caution any ecommerce site owners reading this and thinking “well I don’t do any of that and I’m fine so far”. You can get away with it for a while but there is a tipping point where Google says “enough is enough” and shoves your site into the omitted results. Sites at particular risk are those that retail products and take copy, meta data and images from suppliers and distributors. All of this is duplicate content and you’ll see it in numerous other places on the internet.

  • http://trafficcoleman.com/ TrafficColeman

    Dup content scares people out of their minds..but all they have to do is just keep track of what thier writing and make sure their not submitting something twice..just that easy..

    “Black Seo Guy “Signing Off”

  • http://www.themediaflow.com/ Nichola Stott

    @Paul – I could not agree with you more, and in fact any sites with multiple instances of substantially similar pages that were laboring under a false sense of security were definitely effected by the so-called May Day update last year. In addition to retail sites I’d also add affiliate sites to the “particular risk” category, which often have on-site and cross-domain duplicate content issues to look out for. I didn’t really touch on that in the post, as it’s a whole other can of worms ;-)

    @TrafficColeman – i really don’t think it is that easy for enterprise level sites, with say 1 million product level pages and a high churn-rate. Such sites are database driven and to try to manually record and differentiate content by hand is just not possible or sensible for any business.

  • http://yoyoseo.com Dana Lookadoo

    Your takeaways are easy implementation for ecommerce sites, Nichola. This is a must read to help site owners understand the implication and fixes.

    I especially like “front-loaded” tips regarding title tags. I recommend site owners take the time to keep all these elements, including the 1st paragraph edits, in a content worksheet in Excel. Each row represents a page. Then they can scan the rows and identify multiple pages at a time to help identify how many of them are duplicate.

    Of course, starting with proper IA from the beginning is the ultimate solution.

    Bookmarking this to share as a reference!

  • http://www.catawebonline.com/ Cataweb Online

    It all boils down to smart HTML and meta tagging.

  • http://www.vistastores.com/ Hiren Modi

    @Nichola Stott
    This is such a nice reading about eCommerce copy writing. I read your entire post and have assumption regarding product page tab section or content.

    I am selling too many products on my eCommerce website which was made by same manufacturer. So, I have added one tab like Manufacturer Details on each product pages. I think, it suppose to create duplicate content issues in my website. You can get more idea by visiting following URL.

    http://www.vistastores.com/indoorlighting-elklighting-d1472.html

    Have you any idea to fix this issue? I am looking forward for your valuable reply in same direction! Again, such a nice blog post

  • Scott Rotton

    This is very difficult to do for sites such as mine, i run http://www.sex-toys-online-com.au which has 300 products, its going to take an awful long time to write a specific peice of text for all of them, does anyone know any cheap services around that do this, if I got my SEO people I would go bankrupt buy the end of the month :-)

  • Pingback: Rel Attributes that Impact our SEO Efforts