Yesterday I read an extremely useful and practical walk-thru on Page Level Search Engine Indexation, by Richard Baxter at SEO Gadget. In the post Baxter looks at a variety of data sources from the simplest, such as Webmaster Tools, which will tell you how many URLs are in your sitemap and how many of those are indexed – (the difference therefore being of interest;) to more robust methods of actually identifying which of the total site pages are errant, in this case by building a Mozenda agent scraper. Definitely worth a read for anyone working on large websites (particularly ecommerce sites) with multiple instances of similar page- type e.g. product- level. Setting aside issues of information architecture, internal and external linking; one of the most common issues we see at theMediaFlow when presented with sites that have page-level indexation issues, is one of duplicate content. Cross domain duplicate content carries related and additional challenges, however in this case I wanted to share some experiences of solving page indexation issues when duplicated content is present on product level pages. First-off, duplicate content doesn’t necessarily mean that you have an exact replica of a page occurring again on the site. More that the page is found to be so substantially similar to another that it is not indexed. Essentially, the page is found to be adding nothing of variety or additional value when compared to another page on the site. The natural and simple response would be to vary the page content by hand, however this is a problem most common with large ecommerce websites where on occasion such a solution is not practical. As an example if a website sells screws, and happens to have fifty different size of screws, available in two different types of material for seven different applications (from exterior wood to interior masonry) then that’s 700 pages that could potentially be considered inherently the same.* *That is imagining the site has been built to display one finite product type, per page – which actually wouldn’t be best-practise from an information architecture perspective; however in this example we’re looking at how to solve problems after-the-fact. If you have a large site with page-level indexation issues, which seem to be attributable to substantially similar page-level content, here’s a few points to consider: 1. Lead with the point of difference Ensure your point of difference is “front-loaded” in meta-title and on-page title tags. E.g. “20 mm brass exterior-wood screw” as opposed to “exterior wood screws – 20mm brass”. 2. Remove boiler-plated text “Boiler-plate” text is a passage or paragraph of standardised text, such as the “About the Company” bit at the footer of a press release. You may often find boiler-plated text on ecommerce sites, on all pages where the product includes a common feature. E.g. If all the pages about brass screws on our example site have a paragraph of exactly the same text about the quality of the brass, and the ratio of copper to zinc used in the alloy. Instead, consider creating a page or new section on the site “our materials”, “our quality materials” and housing such content there and placing a link to the appropriate reference to materials at the product page footer, as opposed to a wholesale duplicated paragraph. 3. Canonize a product page To canonize a page, i.e. to select a preferred page out of a group of very similar pages and implement a canonical tag, is a solution explicitly evolved out of the issues posed by duplicate (or substantially similar) content.
- Group your substantially similar pages (e.g. all exterior wood screws in brass, from smallest to largest size)
- Identify a suitable page to be canonized (it would make sense to consider which page may already be the strongest by looking at internal and external links to these pages and consider also which product may sell best.)
- Make note of the URL of the canonical page e.g http://example.com/screws/30mm-brass-exterior-wood.htm
- Implement the rel=canonical tag on all the substantially similar pages in the group, in the HTML header
<link rel=”canonical” href=”http://example.com/screws/30mm-brass-exterior-wood.htm”/>
If you’re unfamiliar with the canonical tag, there are a number of pitfalls and often serious implications of poor implementation. I would suggest you begin with a thorough read of this excellent post by Lindsay at SEOmoz, which is an extremely thorough guide to correct implementation of rel canonical.
4. Handwrite a bespoke original sentence per page
Add a sentence or two of completely bespoke text as a product description or bulleted feature line, on each product page. On occasion you may find that this solution may be all that is required, however if you have an extremely large number of substantially similar pages, or if this solution isn’t practical for whatever reason, then i would strongly recommend you at least add some unique text to your canonized page.
Finally; whilst this post looks at ways to solve existing duplicate content issues, as an explicit cause of incomplete indexation; it must be noted that such issues can often be avoided completely with good information architecture at the outset.