Clicky

X

Subscribe to our newsletter

Get the State of Digital Newsletter
Join an elite group of marketers receiving the best content in their mailbox
* = required field
Daily Updates

The Rise of Content Scraping – is it Sharing or Stealing?

27 February 2014 BY

94 Flares Twitter 57 Facebook 8 Google+ 10 LinkedIn 7 Buffer 12 Email -- StumbleUpon 0 Pin It Share 0 Filament.io 94 Flares ×

For any writer or blogger out there that develops a piece of content, you know the drill – take a brief, do our research and write about a topic that’s not only of interest and value to the industry, but ultimately to the end reader. Once its published on the web, we share it amongst our social, email connections, watch it gain engagement and then enjoy it (whether the response is positive or negative) when we see people comment or share it further. Surely that’s one of the objectives of creating and publishing content, right?

Someone creates a piece that is relevant, compelling, offers insight, shares their own knowledge or opinion on topics and away we go. But what happens when that same piece of content is then posted on multiple sites, without consent or knowledge to the writer or the company paying for it? And what does it do the ‘credibility’ of that piece of content and the source publisher?

I have noticed a really sharp rise of recent of content that is being scraped off trusted sites and used elsewhere without any prior knowledge and the first time you know about it, is usually via Twitter or even in via search engine results, if you dig deeper you too will see lots and lots of instances of this happening.

Strange Appearances of State of Digital Content

Recently we saw two articles that had been published on State of Digital turn up on different sites. First off was Haukur Jarl who wrote a compelling piece on “My Adwords Wishlist for 2014” that had many of those who work in paid search nodding with agreement on his points covered (or wishing for).

My Adwords Wishlist for 2014 - Haukur Jarl

The exact same piece of content appeared on another site with an hour, no change of content at all, apart from some very odd hyperlink style action over the top pieces.

Our very own Bas Van Den Beld was next up with “How MSN Travel Handles its Content Marketing”,  where State of Digital has launched an excellent E-book with Linkdex around how to create compelling and inspiring content for online.

How MSN Travel handles its Content Marketing - Bas Van Den Beld

Again it appeared on another site within an hour, scraped and a slight addition to the title URL.

Many will think so what? It happens all the time, others may be incensed and some actually flattered that their content is be used by others. But having spoken to both Bas and Haukur and a few other authors, there’s a bit of a mixed response to what is happening and why.

What are the Pro’s and Con’s of Scraping Content

There are two sides to the discussion that have come back, firstly the pro’s:

  • The content has a wider audience reach than the publishing platform alone
  • Subscribers to the “3rd party” site can read and engage with relevant content
  • The 3rd party site has plenty of fresh content
  • They don’t have to pay to create content
  • They could be seen as an authoritative site
  • 3rd party site makes revenue through ad space on its site from visitors

And the con’s:

  • Creator and publisher has no control over the site its on
  • Site could be considered spam by search engines and create negative SEO when linking back (if nofollow isn’t added to links)
  • Confusion in search engine rankings showing multiple instances of same content
  • No consent from the author or host site
  • In some cases duplicate content – the originating site will be ranked #1, but SERPs will show the listings for all sites
  • Potential loss of traffic to the original hosting site
  • Cost and resource of creating the content sits with the original site

I would agree with most that overall the content being seen by a wider audience and potentially shared is a real benefit and an objective of why its created, but I also feel that scraping content like this can harm the integrity of the originator site, the author and the content in the long term and would really ask and welcome an answer, what’s the real value in doing this for the 3rd party site?

So if you are going to scrape content from a site please:

  • Make sure to clearly define the author, originator site within the post
  • Use best practice SEO for any links and search engines – so there is no chance of negative SEO
  • Content you scrap has relevance and context to what your site offers
  • Reach out and ask the author or originator if its ok to do it (you never know they may even offer to write some content for you)

I’m really interested in any one else who has feedback about this, whether you are for it or intensely against it, let me know. In the meantime I will check to see if this article gets scraped in the next few hours and contact the host sites.

AUTHORED BY:
h

Russell O’Sullivan is an all-round digital marketing and ecommerce manager. With over 15 years of experience in the digital environment, he has worked across varied disciplines such Content Strategy, PPC, SEO, Ecommerce, Social Media, Web Design and UX.
  • http://snafflepuss.wordpress.com/ Nicole Healing

    I see this all the time on sites like Mashable and Buzzfeed, but they do point back to the author/original page. I like to play a game with them and beat them to seeing the content before it reaches their sites. *I should get out more*

  • Laura Phillips

    Great post Russell :) Personally I am mostly against it for exactly the reasons you described. It’s rare in my experience that manners and best practice are involved. Sadly the original does not always outrank the scraped content. Google have in fact created a tool so we can tell them when this happens. This of course means it is a real problem, but it also means they are refining how to deal with it.

  • Eric Dahlinger

    “So if you are going to scrape content from a site please:…” i’m not sure that “please” gets the job done. Scrappers are more or less stealing content, visitors and the cost of creating the content originally as was noted in the article.
    Not cool.
    A link would to the original content would preferable in all cases, perhaps with a short summary. Hijacking the entire content and giving the appearance of creation seems highly deceitful.
    The argument of higher rankings can, but doesn’t always, work for the content creator.

    • Russell O’Sullivan

      Hi Eric

      I have used the “polite” way of trying to resolve content scrapping, recently one of the State of Digital team contacted the web admin of a site to ask just that, and within 48 hours, the content stopped being scraped?. So it’s has shown to work. You may need a stronger line of defense though with some other sites, so you will need to be prepared to keep pushing.

      I agree, the link to the author or original site is necessary, to show the reader where it actually came from, but making sure that nofollows are in place.
      Thanks for your comments.

94 Flares Twitter 57 Facebook 8 Google+ 10 LinkedIn 7 Buffer 12 Email -- StumbleUpon 0 Pin It Share 0 Filament.io 94 Flares ×