SMX East: Duplication, Aggregation, Syndication, Affiliates, Scraping & Information Architecture

SMX East: Duplication, Aggregation, Syndication, Affiliates, Scraping & Information Architecture

16th September 2011

Coverage of SMX New York 2011 is provided by our guest author Jackie Hole.
Hardcore session from the ‘Technical SEO’ Track. With so much to cover in one session, most of the talks were an overview so additional reading is definately required.

  • Brian Cosgrove – TPG
  • Vanessa Fox – Search Engine Land
  • Matt Heist – High Gear Media

Speaker #1 Brian Cosgrove – Thin Content, Low Quality, PageRank dilution

This is seen as a big problem on the web today and there are a lot of reasons but it’s not necessarily technical decisions. Brian believes that the issue is process orientated and the challenge includes; feeds from suppliers, too many categories, or that items are too similar.

Top 10 Checklist Takeaways: Brian Cosgrove Technical SEO

  • Any sort of large set of data you bring in that was generated from somewhere else is not unique content
  • If you want to rank for too many things with complex or similar categories – this can dilute the focus
  • Avoid ‘mindless’ content that does not speak to the audience
  • Focus on category pages when developing clean unique content
  • Have an SEO strategy that is related to a defined business goal
  • Map your keywords so that you are not targeting the same keywords on multiple pages
  • Have a content strategy – if outsourcing content, define teams and roles, and provide timelines and goals for content production
  • Develop a style guide for voice, tone, quality, updates to maintain consistency
  • Create a content calendar to proritise content, dates, aniticipate space for breaking news
  • Do an SEO review before undertaking work and have a full brief

Quote of the session: Writing unique content is the cost-of-entry for SEO…

View Full Summary and PDF presentation: [needs login]

Speaker #2 Matt Heist – Automotive Case Study: Dealing with Duplicate Content

Matt Heist Duplicate ContentThis session covered content from a publishers experience as opposed to from an SEO agency perspective. Using 7 writers and publishing 1200 pieces of content a month, Matt had some useful information regarding duplicate content based on personal experience.

The idea was to launch as many sites as possible in hyper niche topics to get an audience across a lot of web sites. Although the company invested in high quality content, they launched 100 sites with the same content focus so being hit for duplicate content and by Panda was inevitable.


  • Too much content and you compete against yourself
  • Oversharing of content between websites
  • Undifferentiated look between sites
  • Investing in original quality content but putting it everywhere


  • Eliminate non core sites with 301s (from 105+ down to 7 core sites)
  • Properly ‘canonicalise’ duplicate content
  • Keep aggregated content with strong user engagement
  • Use unique expert reviews for each target segment
  • Re-design your website and strip out as much unnecessary junk as possible

Takeaways: Don’t take your niche too far, make fewer sites, differentiation matters, premium content is costly but worth it.

View Full Summary and PDF presentation: [needs login]

Speaker #3 Vanessa Fox – Approximate Duplication and Information Architecture

Brian Cosgrove Vanessa Fox Matt HeistIf you have duplication problems, where should you start and what should you do?

Vanessa has been working with a Federal Government task force to clean up over 24000 federal government websites that are often put up online for any and every policy initiative. Many domains are duplicate or outdated. (For example there are 14 website surrounding student aid).

Many sites suffer from many types of duplication – ‘approximate duplication’ occurs with overlapping topics which results in overall confusion.


  • Find if you have an approximate duplicate problem by performing a site search – Example – site: counting calories (60,000 results) – if there are thousands of pages on a keyword term many pages are probably not necessary
  • Use search personas to identify your audience in order to present specific useful information in major useful topics
  • Focus on the important pages and get rid of the thin content
  • Mapping shows you where the gaps are in your content
  • Look at your information architecture if you can’t get into a nice heirarchical structure – you may have a duplication problem
  • If you understand your audience it is easier to consolidate your information

View Full Summary and PDF presentation: [For SMX visitors needs login]

For more presentations from Vanessa fox on Slideshare –

More posts about SMX East 2011


Be sure to also look at our overview page of SMX East 2011

The coverage of SMX East on State of Search in part made possible by a sponsoring from Majestic SEO who have the largest Link Intelligence database in Search. To get your free trial, give a card to this blogger in person at the conference or drop us an e-mail.


Written By
Jackie Hole is an Award Winning Search Marketing consultant specialising in Paid Search, Conversion Improvement and Organic Search / SEO for the USA, Canada and European markets. Starting out in Multimedia & Interface Design, Jackie has over 15 years experience in online marketing and was recently awarded European Search Personality...
  • This field is for validation purposes and should be left unchanged.