The Rise of Content Scraping – is it Sharing or Stealing?

For any writer or blogger out there that develops a piece of content, you know the drill – take a brief, do our research and write about a topic that’s not only of interest and value to the industry, but ultimately to the end reader. Once its published on the web, we share it amongst our social, email connections, watch it gain engagement and then enjoy it (whether the response is positive or negative) when we see people comment or share it further. Surely that’s one of the objectives of creating and publishing content, right?

Someone creates a piece that is relevant, compelling, offers insight, shares their own knowledge or opinion on topics and away we go. But what happens when that same piece of content is then posted on multiple sites, without consent or knowledge to the writer or the company paying for it? And what does it do the ‘credibility’ of that piece of content and the source publisher?

I have noticed a really sharp rise of recent of content that is being scraped off trusted sites and used elsewhere without any prior knowledge and the first time you know about it, is usually via Twitter or even in via search engine results, if you dig deeper you too will see lots and lots of instances of this happening.

Strange Appearances of State of Digital Content

Recently we saw two articles that had been published on State of Digital turn up on different sites. First off was Haukur Jarl who wrote a compelling piece on “My Adwords Wishlist for 2014” that had many of those who work in paid search nodding with agreement on his points covered (or wishing for).

My Adwords Wishlist for 2014 - Haukur Jarl

The exact same piece of content appeared on another site with an hour, no change of content at all, apart from some very odd hyperlink style action over the top pieces.

Our very own Bas Van Den Beld was next up with “How MSN Travel Handles its Content Marketing”, where State of Digital has launched an excellent E-book with Linkdex around how to create compelling and inspiring content for online.

How MSN Travel handles its Content Marketing - Bas Van Den Beld

Again it appeared on another site within an hour, scraped and a slight addition to the title URL.

Many will think so what? It happens all the time, others may be incensed and some actually flattered that their content is be used by others. But having spoken to both Bas and Haukur and a few other authors, there’s a bit of a mixed response to what is happening and why.

What are the Pro’s and Con’s of Scraping Content

There are two sides to the discussion that have come back, firstly the pro’s:

The content has a wider audience reach than the publishing platform alone
Subscribers to the “3^rd party” site can read and engage with relevant content
The 3^rd party site has plenty of fresh content
They don’t have to pay to create content
They could be seen as an authoritative site
3^rd party site makes revenue through ad space on its site from visitors

And the con’s:

Creator and publisher has no control over the site its on
Site could be considered spam by search engines and create negative SEO when linking back (if nofollow isn’t added to links)
Confusion in search engine rankings showing multiple instances of same content
No consent from the author or host site
In some cases duplicate content – the originating site will be ranked #1, but SERPs will show the listings for all sites
Potential loss of traffic to the original hosting site
Cost and resource of creating the content sits with the original site

I would agree with most that overall the content being seen by a wider audience and potentially shared is a real benefit and an objective of why its created, but I also feel that scraping content like this can harm the integrity of the originator site, the author and the content in the long term and would really ask and welcome an answer, what’s the real value in doing this for the 3^rd party site?

So if you are going to scrape content from a site please:

Make sure to clearly define the author, originator site within the post
Use best practice SEO for any links and search engines – so there is no chance of negative SEO
Content you scrap has relevance and context to what your site offers
Reach out and ask the author or originator if its ok to do it (you never know they may even offer to write some content for you)

I’m really interested in any one else who has feedback about this, whether you are for it or intensely against it, let me know. In the meantime I will check to see if this article gets scraped in the next few hours and contact the host sites.