If you have been blogging for a while, chances are you are familiar with content scrapers. Content scrapers are websites that steal your content for their own blogs without your permission. Some content scrapers will just copy the content off of your blog, but most use automated software that takes the content from your RSS feed and posts your content to their site like it is a new post.
In this post, we are going to look at some potential link building benefits to content scrapers, how to find out what sites are scraping your content, and what you can do if you want to either benefit from the linking standpoint or have them take it down.
Linking Benefits of Content Scrapers
Last week, I was happy to see that I was listed in ProBlogger’s 20 Bloggers to Watch in 2012. Within 24 hours, I received a notification in my WordPress dashboard that a page on my blog had been linked to in the post on ProBlogger’s site.

After receiving the original notification from the ProBlogger post, I also received another 18 trackbacks from sites that had stolen the content in their post verbatim. Trackbacks are WordPress’ way of letting you know that another website has linked to a post on your blog. In this case, these 18 sites had posted the content exactly like the original post – with the links back to my blog still intact.
It was then that I started contemplating the potential link building benefits of content scrapers. These are not by any means quality links – the highest Google PageRank was a PR 2 domain, many were stealing content in a variety of languages, and one even had the nerve to use some kind of redirection script to take away the link juice of outgoing links! So while these links didn’t have the same authority that the original post had, they still count as links.
How to Catch Content Scrapers
Unfortunately, unless you want to continuously search for your post titles in Google, you’ll only be able to easily track down sites that keep your in-content links active. If you want to know what websites are scraping your content, here are a few tips to sniff them out.
Copyscape
Copyscape is a simple search engine that allows you to enter the URL of your content to find out if there are duplicates of it on the Internet. You can get a few results using their free search, or you can pay for a premium account to check up to 10,000 pages on your site and more.
Trackbacks
The first way is through your trackbacks in WordPress (as shown in the image above). Many of these will show up in the spam folder if you use Akismet. The key to getting trackbacks to appear from content scrapers is to always include links to other posts in your content. Be sure those links have great anchor text too, if you’re going for a little extra link juice. And even if you are not, internal linking with strong anchor text is good for your on-site optimization too!
Webmaster Tools
The next way to catch them is in Webmaster Tools. Simply go to your site in Webmaster Tools, and look under Your Site on the Web > Links to Your Site. Then sort by the Linked Pages column.

Anyone thinking about link building benefits at this point is probably noting the sheer volume of links from these sites, some of which are content scrapers. Essentially any site that is linking to a lot of your posts that isn’t a social network, social bookmarking site, or a die-hard fan who just loves linking to you is potentially a content scraper. You’ll have to go to their website to be sure. To find your links on their site, click on one of the domains to see the details of what pages on your site they are linking to specifically.

Then, click on one of your links to see which pages on their site is linking to yours.

You can see here that they are just blatantly copying my posts titles. When I visited one of the links, sure enough, they are copying my entire posts in their full glory onto their site.
Google Alerts
If you don’t post often or want to keep up with any mentions of your top blog posts on other websites, you can create a Google Alert using the exact match for your post’s title by putting the title in quotation marks.

I deliver all of my Google Alerts to an RSS feed so I can manage them in Google Reader, but you can also have them delivered regularly by email. You’ll even get an instant preview of the types of results you will get.

How to Get Credit for Scraped Posts
If you use WordPress, then you definitely want to try out the RSS footer plugin. This plugin allows you to place a custom piece of text at the top or bottom of your RSS feed content.

The result is this simple line on my blog posts when viewed through a RSS feed.

As you can see, even if you aren’t using it for the purpose of getting credit back to your posts when content thieves steal it, you can still use it for a little extra bit of advertising with the possible benefit of people who subscribe to your RSS feed clicking through to your website or social profiles. And when someone does scrape your content from your RSS feed, it shows up there too.

So in the event that someone finds your scraped content, they will hopefully notice the credit before assuming it was created by the blog that stole it. If you don’t have WordPress, you can simply include a note at the top or bottom of your content that includes the same information.
How to Stop Content Scrapers
If you’re not interested in anyone copying your content, then you have a few options to choose from. You can start by contacting the site that is stealing your content and sending them a notice that you want all of your content removed immediately. You can do this through the site’s contact form, email address, or post it to any social accounts they list.
If there is no contact information on the website stealing your content, you can do a Whois Lookup to (hopefully) find out who owns the domain.

If it is not privately registered, you should find an administrative contact’s email address. If not, you should at least see the domain registrar which, in this case, is GoDaddy and/or the hosting company for the website which, in this case, is HostGator. You can try to contact both companies (HostGator has a DMCA form and GoDaddy has an email) and let them know that the domain in question is stealing copyrighted content in hopes that the website will be suspended or removed.
You can also visit the DMCA and use their takedown services to remove anyone who is copying your photos, video, audio, blog, or other content. They even offer a WordPress plugin to incorporate a DMCA protected badge on your site to warn potential thieves.
Have you ever dealt with content scrapers and thieves? Do you leave it alone for the link benefits, or do you fight back? What other tools, services, or other preventative tactics do you use to block content scrapers? Please share your thoughts and experiences in the comments!
About the Author: Kristi Hines is a freelance writer, professional blogger, and social media enthusiast. Her blog Kikolani focuses on blog marketing for personal, professional, and business bloggers. You can follow her on Google+, Twitter, and Facebook.
Awesome Infographics ♥ Marketing Guides ♥ User Feedback Software ♥ Customer Analytics Software


Great post, Kristi.
I’m always on the look out for content thieves.
Thanks PenPoint! :)
In the past I’ve used Tynt. You add a script to your blog and it reports what has been copied.
I hadn’t heard of that Marlene – will have to check into it. Thanks! :)
I think it’s very important to make a distinction between content scrapers and content curators.
Content scrapers are thieves — that is, they take your writing and post it, without permission or attribution, as their own work.
Content curators — especially dedicated ones — create carefully-selected collections of posts for specific audiences, that cross many different sources…and include only the headline and a snippet of text, as well as tags that help guide the audience to a set of sub-topics.
I’m a content curator, editing a relatively new site called “Rich Content Daily” (http://www.RichContentDaily.com). I ran across your post because I’m always on the lookout for interesting and important articles for practitioners and creators of rich content for content marketing and online learning.
I believe that content creators help bloggers for several reasons: 1.) they expose their content constantly to relevant new audiences; and 2.) they offer the SEO “juice” that comes with inbound links. Our objective is always to encourage click-through to the original article, so we only publish a “teaser” snippet, always being careful not to “give away the punchline” of the article. (In our case, we also create original content ourselves on the site.)
Do you agree with this important distinction between scrapers and curators — especially when the curation is carefully hand-done, as we do at RIch Content Daily? Do you have any advice for curators on how we can be even more helpful to authors whose work we so much respect?
Scrapers and curators are two entirely different creatures, imo. If you’re not publishing the entire article, you aren’t a scraper.
I won’t speak for anyone else, but we love curators. :)
Hi Michael! I do believe there are good ways to go about content curation, and so long as you are getting permission from the author and crediting them in the repost, then there’s nothing wrong with it at all. I’ve given permission to a few curation sites – I basically look for a site where I can expand my own reach.
As far as advice, I would say that you need to put up the reasons why someone should want content on your site. Traffic stats, future plans, how it will be displayed, customized byline, etc. Maybe even some control as to which posts will be curated and which won’t, like letting the author submit a category feed instead of their main one. :)
I’ve seen a lot of sites copy my content verbatim, leaving links in tact, but they always show that the post was written by me and my name is linked back to my site. Technically, is that scraping?
I consider any site that uses your full content without your permission a form of content theft. Scraping is really the process for grabbing the content – specifically the software / plugins that will “scrape” content from RSS feeds. Some of those will do it with full attribution, others will just grab what is there, and some sites specifically strip any links or originating author information.
Excellent post, specially about how to turn the content theft to your advantage…
I figure if you can’t fight it, benefit from it! :)
Kristi, I love that you’ve found the “bright side” of the issue. We could complain all day about the scrapers and make ourselves crazy fuming, but they aren’t going away any time soon. I’d rather join you in making lemonade! Thanks for pointing us to the sugar.
It’s hard to justify sometimes, but if you can find benefits that make it feel like you’re not completely losing out, it helps.
Great post Kristi!
I installed the RSS-footer immediately and it works like a charm!
Thanks for the tip!
Cheers,
Kris
https://twitter.com/KrisOlin
PS. I’m now following you on Twitter as well.
You’re welcome! I love that plugin! :)
Is it worth hunting them down? If you run a site with several new posts a day, it’ll be a lot of work keeping track of them and hunting them down.
It depends on how proprietary you want to keep your content. I know some people that do hunt them down and stand up to them. I really don’t have time to do it, but sometimes I will check out the sites to at least make sure they are not posting my content next to something offensive.
Really enjoy the rss footer options
Hi! It is very useful information. Thanks for sharing. Now. I can catch content thieves.
Awesome post, Kristi! My content, but especially my photos, are hijacked fairly frequently. I watermark all my photos, and sometimes the thieves have the nerve to crop out the watermark. I have started watermarking them higher up in the photo and more prominently. My content is also frequently plagiarized. People say that copying is the sincerest form of flattery. I really don’t want to be flattered in that way!
For me, the biggest problem is website designers who steal images off Google Images to make their clients’ sites pretty. Very often, the client is totally unaware where the photos came from or that they are infringing on someones’ copyright. I received a nasty email from a web designer after sending him a copy of the US Copyright law (he said the photos didn’t say “copyright” so he was free to use them – he was wrong). He did remove the photos, but not without a fight. I doubt he will ever use another of my photos, but he made it clear he was not going to change his ways.
Thank you for this great information (I love all the details) and for getting the word out.
You might try contacting the client in whose material the image appears, and send them a bill for the licensing fee for the image, with an explanation that their web designer has illegally used your material.
I’ve seen it done and I’ve seen it work. Even if the bill doesn’t get paid, you make the designer look horrible. Which he/she should.
Hi Michelle! I’d like to think that copying is flattery, but a lot of copying these days comes from automation based on a keyword. Flattery would be a bit better comparatively. :)
When it comes to photos, that is definitely something you want to fight for. My husband sells his photos as prints, so it’s almost product theft when his are taken. He uses the meta data option in Lightroom to automatically add his name, website, and copyright information into each photo he imports. I’ve noticed that when you upload one of his photos, that info actually pops up in the details in WordPress. I’m not sure what program you use, but you could put your copyright info in that way that way it is on each individual photo.
I used copyscape and found that some one has copied my content in my social media blog . But What would be the next step?
Hi Jonathan. The next steps are listed in the last section of this post under “How to Stop Content Scrapers.” Basically you can report the site to their hosting company or work with the DMCA to do a takedown. Good luck!
Kristi, thanks for pointing out what most people don’t seem to realize — if you include lots of internal links in your content (which is a good practice anyway), scrapers don’t do much harm. (I’m talking written content, photographers should IMO go after those who steal their work.)
If we spent any time going after the hundreds of scrapers who lift Copyblogger posts, we’d have to choose what productive activity to stop doing instead. It’s annoying, but there are worse annoyances on the web.
I really like that RSS footer plugin.
Hi Sonia! I know what you mean – tracking down people who lift content from my site wouldn’t be so bad, but if I started going on the lookout for the other sites I write for, then I’d be in for a time-consuming fight. I figure even if they are lower quality, having the backlinks is nice. In the even that someone did find a scraped post over the original, at least they’ll be pointed back in the right direction. Glad you like the plugin! :)
I am so glad that I found this. I hate thieves. I work hard to create my own content. I pay for my images. Why should people get away with stealing my hard work. I have +1d this, liked it, shared it and now I’m going to bookmark it. After that I’m going on a crusade. If I can get a thief’s blog or site shut down, I will. If everybody took action, they would think twice about their parasitic behaviour.
Thanks for sharing this post Steve. As far as the images, if you pay for the image license, and the content is scraped, can you report the thief to the site/artist that licensed the image? Then maybe you can have them fight it too?
Your suggestion about adding in an RSS footer, including links and simple CTAs, is gold! So simple, but one I hadn’t thought of myself. :)
I used to have the plugin on my site, then forgot to reinstall it after I had to move the site to a new host. Then I saw it on another of my subscriptions and remembered just how genius it was!
Agree with Simone to a great extent that if you are likely to get scraped make sure there are plenty of full url address links (not relative ones) back to internal content to minimise any possible detrimental impact. Most scrapers will maintain content text links.
Do they not say ‘is plagiarism or imitation not the sincerest form of flattery’!
Having said that if the only objective is to attract visitors to a low-quality site mainly for adsense or other advertising purposes it may be that the recent Panda/Farmer update and subsequent algorithm changes may have addressed some of the issues with low-quality scraper sites.
If I remember rightly, Matt Cutts, Google’s Spam Master, stated that although an issue of duplicate content their algorithm can and does make every effort to identify the original content giving it attribution and higher ranking weight than obvious straight copies.
Google does a good job of ranking the scrapers beneath the original Rob, but I have still found some cases where I’m searching for an article and come across the duplicate instead of the original. And I have seen some scraper sites actually have comments on the stolen posts too! That’s why I think sticking an attribution in is important, just in case people find the duplicate first.
Great tips on RSS footer options for WordPress. Thanks.
Glad I discovered you on the new My SEO Community site.
You’re welcome Jay! Glad you discovered us too! :)
Really through post. I definitely learned something today, although, thankfully I haven’t had to deal with it firsthand (yet!). I am going to add the RSS footer that you describe today, because you are so right that we can’t prevent scrapers completely so we might as well do what we can to benefit from their actions as much as possible.
I did see you listed on the Copyblogger list of bloggers to watch in 2012. After reading this post, I’ll definitely be back.
Heck yes everyone should fight back to help make this problem shrink! The question we have is what if it is a press release? We link back but do we have to since the release is for journalist?
I’m not sure how it works with press releases… most of the scraping seems to happen with bloggers.
I only knew about Copyscape, I never realised there were so many ways to find out who is nicking your content. I was even more surprised to learn I could find out via Google Analytic. Thanks Kristi. :)
Thank you or the tip on the plugin! I installed it immediately. I often find my RSS feeds on other scrapper sites. Thank you!
Good job Kristi…I never knew Webmaster Tools will have such stuffs in it. I will explore them tonight and check because our website content was on several blogs..need to track them. I was looking for some application to find them, wonder webmaster tools got what I really want.
The only time I happen to notice my content or images that appear elsewhere is because I happen to be checking logs and notice a bunch of incoming traffic from the same site or page. It tends to be other sites using my screen shot images. It is a bit annoying. I don’t have a lot of time to contact the site owner, host, or dmca. It does make me wonder how much is actually copied, scraped, etc. and many people probably don’t even realize it.
really learnt a lot through this article.images made it easy
Very interesting article, and there are a couple of links I should definitely check. As for me I just publish rss as summary and not full posts. It works for now but one of the tool I have to check is the rss footer plugin.
Thanks for the interesting article.
Well, lots of people having their own blogs have to deal with this problem of content scrapping. This is the same problem with me as I have a blog on umrah packages and I have to suffer a lot due to the theft of my content but now I will check out the tools suggested by you. Your suggestions are worth practicing.
Great post kristi and thanks for the link to wordpress Rss footer plugin. As you say its a brilliant way to get some credit back even if they are low quality sites. Better than nothing at all isnt it?
Thanks so much Kristi!!! I’ll have to go check my other blogs but I know for certain that this is happening regularly with one of them… I had been wondering what all of those WordPress Trackbacks were and if it was a good thing or not. Now I know and I know what to do about it.
Most of the blogger are facing the same problem including me. This post is giving the valuable information. Thanks for this post.
Content Thieves really annoy me so thanks for you suggestions.
We take the time to create good unique content and some **** steals it and half the time doesn’t even bother to spin it.
Lazy scumbags.
Thanks for the invaluable info!
That make two of your post i now need to take action after seeing. You are deff keeping me busy :)
I am glad that you point out that there are some situations where content scrapers benefit the sites from which they retrieve information.
A good example of “white hat” web scraping is my horses for sale platform at http://horses.fm, Horses Farm & Market. It indexes horse for sale ads from other sites and compiles them into a single, searchable list. It gives a very brief summation of each external ad with a link to the original ad, and it also has Twitter, Facebook, and Google+ share buttons for each external ad as a courtesy to those sites. The intent is to augment the external services may providing a layer of greater functionality and accessibility.
Thanks so much for the WP “trackbacks” clarification. I usually just spam or trash them when the sources are less than reputable (most are). Ironically earlier today I was curious and followed one trackback only to find my content on another site with back-links still intact? I do use anchor text in my href’s often so like you said it’s odd that they “scrapers” leave it all intact? Anyways thanks so much for the clarification.
Cheers!
I will try out all these tips mentioned. This is one of the biggest causes of headache for webmasters that others steal their content but you have mentioned really a practical solution to this bog problem.
Great publish and blog! I do not have time for you to read every publish right now however i have book-marked it as well as added your Nourishes, then when I’ve time I’ll be to find out more. Please continue the truly amazing work.
Finding out such content scrapers of your site is really a tough job. But thanks for your nice post and tips I will act upon these tips mentioned in your post to trace content scrapers of my site.
Hi,
Its very difficult to stop content scrappers. They will get the contents checked for copy scape, with the Google Analytics showed some results for copy scapes. And they do necessary steps.
In my opinion you can fight and win but you you still lose in one part. You will lose your time and health dealing with content scrapper. I got a question will content scrapper post outrank our post as the original post? Will our page be devaluate by Google when there is duplicate content?
I go for take a benefit from content scraper.
While I certainly don’t track down every content thief, I do pay attention to scrapers and go after them — especially the ones stealing all of the blog content. I’m a writer. I make my living largely blogging for clients (and for myself). When my blog posts appear for free on sites without permission and those bloggers are monetizing them, it cheapens my work in the eyes of prospects and clients. That’s just as bad as someone stealing images that a photographer might want to sell in product form.
I’ve had several sites shut down over the years and even more forced to remove specific posts. A DMCA takedown notice to the host is always a good idea. And if you automate the infringement searches and you use templates for the notices, it really doesn’t take much time or effort to go after them.
I also tell other writers and bloggers to consider approaching two other groups which are better at hitting thieves where it hurts:
1. Search engines — if the stolen content is appearing in results at all, especially anywhere near (or above) your own
2. Advertisers — they can have their ad network account access suspended over copyright infringements which can not only knock out their income on that site, but on any site where they’re using the network’s ads (they often have scraper sites in more than one niche)
Learn to craft a firm cease and desist letter, and you rarely even have to go that far. The vast majority yank the content down without 48 hours when I send them a demand letter to that effect.
Great info in this post and in the comments! I’ve had several of my posts “scraped” and I always comment on the blog as soon as I find it to express my extreme ire and demand the poached post be immediately removed. If it is not, I proceed with further action but have so far gotten cooperation. Some of the scrapers apologize and pretend not to realize that they are stealing content because they included all my credits, but I am careful to educate them about posting content I wrote on their site without my knowledge or permission as being a hybrid of pirating and plagiarism, scraping. I always ask them if they would like it if I copied their posts and put them on my blog without them knowing about it. Another thing that really bugs me about scrapers – they are never good sties that steal my content, they are crappy sites and some are even questionable, that I don’t want to be associated with in any way.