3 Myths About Duplicate Content

The words “duplicate content penalty” strike fear in the hearts of marketers. People with no SEO experience use this phrase all the time. Most have never read Google’s guidelines on duplicate content. They just somehow assume that if something appears twice online, asteroids and locusts must be close behind.

This article is long overdue. Let’s bust some duplicate content myths.

Note: This article is about content and publishing, not technical SEO issues such as URL structure.

Myth #1: Non-Original Content on Your Site Will Hurt Your Rankings across Your Domain

I have never seen any evidence that non-original content hurts a site’s ranking, except for one truly extreme case. Here’s what happened:

The day a new website went live, a very lazy PR firm copied the home page text and pasted it into a press release. They put it out on the wire services, immediately creating hundreds of versions of the home page content all over the web. Alarms went off at Google and the domain was manually blacklisted by a cranky Googler.

It was ugly. Since we were the web development company, we got blamed. We filed a reconsideration request and eventually the domain was re-indexed.

So what was the problem?

  • Volume: There were hundreds of instances of the same text
  • Timing: All the content appeared at the same time
  • Context: It was the homepage copy on a brand new domain

It’s easy to imagine how this got flagged as spam.

But this isn’t what people are talking about when they invoke the phrase “duplicate content.” They’re usually talking about 1,000 words on one page of a well-established site. It takes more than this to make red lights blink at Google.

Many sites, including some of the most popular blogs on the internet, frequently repost articles that first appeared somewhere else. They don’t expect this content to rank, but they also know it won’t hurt the credibility of their domain.

Myth #2: Scrapers Will Hurt Your Site

I know a blogger who carefully watches Google Webmaster Tools. When a scraper site copies one of his posts, he quickly disavows any links to his site. Clearly, he hasn’t read Google’s Duplicate Content Guidelines or the Guidelines for Disavows.

Ever seen the analytics for a big blog? Some sites get scraped ten times before breakfast. I’ve seen it in their trackback reports. Do you think they have a full-time team watching GWT and disavowing links all day? No. They don’t pay any attention to scrapers. They don’t fear duplicate content.

Scrapers don’t help or hurt you. Do you think that a little blog in Asia with no original writing and no visitors confuses Google? No. It just isn’t relevant.

Personally, I don’t mind scrapers one bit. They usually take the article verbatim, links and all. The fact that they take the links is a good reason to pay attention to internal linking. The links on the scraped version pass little or no authority, but you may get the occasional referral visit.

Tip: Report Scrapers that Outrank Your Site

On the (very) rare occasion that Google does get confused and the copied version of your content is outranking your original, Google wants to know about it. Here’s the fix. Tell them using the Scraper Report Tool.

google scraper report

Tip: Digitally Sign Your Content with Google Authorship

Getting your picture to appear in search results isn’t the only reason to use Google Authorship. It’s a way of signing your name to a piece of content, forever associating you as the author with the content.

With Authorship, each piece of content is connected to one and only one author and their corresponding “contributor to” blogs, no matter how many times it gets scraped.

Tip: Take Harsh Action against Actual Plagiarists

There is a big difference between scraped content and copyright infringement. Sometimes, a company will copy your content (or even your entire site) and claim the credit of creation.

Plagiarism is the practice of someone else taking your work and passing it off as their own. Scrapers aren’t doing this. But others will, signing their name to your work. It’s illegal, and it’s why you have a copyright symbol in your footer.

If it happens to you, you’ll be thinking about lawyers, not search engines.

There are several levels of appropriate response. Here’s a true story of a complete website ripoff and step-by-step instructions on what actions to take.

Myth #3: Republishing Your Guest Posts on Your Own Site Will Hurt Your Site

I do a lot of guest blogging. It’s unlikely that my usual audience sees all these guest posts, so it’s tempting to republish these guest posts on my own blog.

As a general rule, I prefer that the content on my own site be strictly original. But this comes from a desire to add value, not from the fear of a penalty.

Ever written for a big blog? I’ve guest posted on some big sites. Some actually encourage you to republish the post on your own site after a few weeks go by. They know that Google isn’t confused. In some cases, they may ask you to add a little HTML tag to the post…

Tip: Use rel=“canonical” Tag

Canonical is really just a fancy (almost biblical) word that means “official version.” If you ever republish an article that first appeared elsewhere, you can use the canonical tag to tell search engines where the original version appeared. It looks like this:

canonical anchor link reference example

That’s it! Just add the tag and republish fearlessly.

Tip: Write the “Evil Twin”

If the original was a “how to” post, hold it up to a mirror and write the “how not to” post. Base it on the same concept and research, but use different examples and add more value. This “evil twin” post will be similar, but still original.

Not only will you avoid a penalty, but you may get an SEO benefit. Both of these posts rank on page one for “website navigation.”

Calm down, People.

In my view, we’re living through a massive overreaction. For some, it’s a near panic. So, let’s take a deep breath and consider the following…

Googlebot visits most sites every day. If it finds a copied version of something a week later on another site, it knows where the original appeared. Googlebot doesn’t get angry and penalize. It moves on. That’s pretty much all you need to know.

Remember, Google has 2,000 math PhDs on staff. They build self-driving cars and computerized glasses. They are really, really good. Do you think they’ll ding a domain because they found a page of unoriginal text?

A huge percentage of the internet is duplicate content. Google knows this. They’ve been separating originals from copies since 1997, long before the phrase “duplicate content” became a buzzword in 2005.

duplicate content over time

Disagree? Got Any Conflicting Evidence?

When I talk to SEOs about duplicate content, I often ask if they have first-hand experience. Eventually, I met someone who did. As an experiment, he built a site and republished posts from everywhere, verbatim, and gradually some of them began to rank. Then along came Panda and his rank dropped.

Was this a penalty? Or did the site just drop into oblivion where it belongs? There’s a difference between a penalty (like the blacklisting mentioned above) and a correction that restores the proper order of things.

If anyone out there has actual examples or real evidence of penalties related to duplicate content, I’d love to hear ‘em.

About the Author: Andy Crestodina is the Strategic Director of Orbit Media, a web design company in Chicago. You can find Andy on and Twitter.

  1. nice post it thinks is true i never seen penalty for duplicate content of course in the other way id the people copy your contnet is good for you cause google know how was the first and the name of this is authority.

    • Rick, glad you found it helpful. Thanks for the great feedback :)

    • Andy Crestodina Jul 14, 2014 at 5:19 am

      Thanks for the comment, Rick. Part of the goal here was to talk about the myths, but also ask people if they’ve seen any real world examples. If you ever see an example of an actual penalty, please let me know!

  2. I agree with you here, especially the last point: Calm down, People.

    However, I work with a lot of marketers that are new to the world of blogging. They are strapped for time and looking to repurpose as much existing content as they can. In that case, they are starting off a brand new blog (or one that hasn’t been established) and in that case, too much duplicate content can easily make all their efforts worthless. So in general, I advice new bloggers to steer clear.

    Also, 2 years ago Burberry dealt with a big duplicate content issue between their US and UK sites, which Google saw as nearly identical: https://productforums.google.com/forum/#!topic/webmasters/WlUzNLFQB54/discussion

    • Adam, thanks for sharing the forum topic as I am sure it will help people here out :)

    • Andy Crestodina Jul 14, 2014 at 5:28 am

      That is a sad story, Adam.

      I tried to avoid the technical SEO issues when writing this, but in this case, it looks like one hreflang link would have avoided the entire issue. That’s exactly the purpose of that tiny bit of code.

      We could easily write a follow up to this and look at the technical side. Maybe another day!

      Andy

  3. On a limited pov, you might argue that your site won’t be hurt by scrapers. However, there are several areas in which you can be damaged – so nobody should be sanguine about or dismissive of these automated thieves.

    • Paul, thanks for the share :)

    • Andy Crestodina Jul 14, 2014 at 5:30 am

      Thanks for the comment, Paul.

      My suggestion here is that content marketers are overreacting and that there is a lack of evidence for actual penalties. If you can think of examples or evidence, please let me know.

      There are a lot of things worth worrying about, but duplicate content isn’t one that’s high on my list…

  4. Mark Traphagen Jul 10, 2014 at 6:24 am

    Great article, but there are two myths in the tips for the second myth.

    1. The Google scraper reporting form does NOT generate any any direct action by Google on your case. They are using it to gather general data to improve their algorithm.

    2. Authorship is NOT used by google as a confirmation of originality. This was confirmed in a Webmaster Central Hangout by Google’s John Mueller.

    • Mark, thanks for sharing your feedback. Looking forward to hearing more from you.

    • Andy Crestodina Jul 14, 2014 at 5:34 am

      Mark, it is always good to get your input! I didn’t assume that the scraper report would lead to a de-indexing of the scraped version, but I did suggest that it was a good action to take…

      On the other hand, until now I actually did believe that Authorship was a strong indicator of originality! I’ll have to research this more. Thanks for letting me know.

      I always learn something from you. I hope to one day collaborate on some content with you. You’re a true wealth of knowledge, Mark.

  5. Thank you for this article. I’ve always told my clients the same thing assuming similar reasoning. It’s nice to have an expert corroborate. :)
    /joseph

  6. Thanks for that excellent distillation of the duplicate content rules. That’s very helpful information.

  7. Scott Tillman Jul 10, 2014 at 7:15 am

    This is really interesting information. Thanks for the insights! The more we know about the way google works, the better off we’ll be in the end. I, personally, didn’t know about a lot of this, so thank you for clearing things up!

  8. Keith Winters Jul 10, 2014 at 7:18 am

    We have been under an algorithmic penalty from Google for over a year. Due to getting spammed by inbound links…we think. Thousands of spam inbounds. We now have a service dealing with disavowing those links, and Google has said that we should be removed soon.

    The service can only disavow like a couple hundred inbound links per month, and there are thousands.

    Only our home page and blog home page are currently coming up in Google searches, and mainly just for branded key words.

    Have you ever seen cases like this, penalty for year +. And can we crawl out of it?

    • Keith, I have seen cases like yours. I think you should just continue providing quality content and making sure you are increasing your engagement. With patience you’ll surely get there.

  9. Great post! There are a lot of myths here that I have been telling people about for ages. Nice to see someone back us up on it. Thanks for sharing

  10. Raymmar Tirado Jul 10, 2014 at 10:18 am

    Awesome post, answered a few questions and confirmed a few suspicions.

    I think that the individual content creator is about to come into a time where they control a new type of online currency. It will be in our authority and originality and I think google is getting it right with how they are adjusting their algorithms.

    The future of SEO will drive the future if a content based sales economy which we are just on the edge of. An open source sales environment where people are the focus, not products.

    Thanks for the good stuff. Keep it coming.

    • Raymmar, glad you liked it. Thanks for sharing. Looking forward to hearing more from you :)

    • Andy Crestodina Jul 14, 2014 at 5:40 am

      Glad you liked this one, Ray. I’m thinking of writing some giant guide that addresses everything related to duplicate content. If I do, l’ll let you know…

  11. Georlandio Oliveira Jul 10, 2014 at 12:54 pm

    I see many people who get the blog article and modifies al is well adapted to such article!

  12. Very good article. I didn’t know about most of these tools in Google. I want to use the Google Author tag to sign my content and have my pic on the Google listings. Not sure how to do it yet but I will look it up. Thanks again.

    • Greg, glad you liked it. Thanks for the feedback. Looking forward to hearing more from you :)

    • Andy Crestodina Jul 14, 2014 at 5:42 am

      Hi, Greg.

      There’s another post I wrote here for KISSmetrics that explains that. Just search for “Google Authorship” in Google and you should see it there on page one. But as Mark Traphagan mentioned in his comment above, digitally signing your article with Authorship doesn’t guarantee that Google will identify it as the original…

      Anyway, glad you found this useful!

  13. Thanks for the feedback!

  14. Patrick Mahan Jul 12, 2014 at 12:22 pm

    What are your thoughts about posting original content on a personal blog, then copying and pasting the entire post on LinkedIn Publishing Platform and Google+? I like the idea of all my original content living on my site… But i also like the built in communities that LinkedIn and G+ offer. Do you recommend this strategy? Why or why not? Thanks!!

    • Andy Crestodina Jul 14, 2014 at 5:47 am

      Great question, Patrick.

      I wouldn’t hesitate to do that, unless you think a substantial percentage of your LinkedIn community has already read it.

      In this case, the original has been out there for a while on your site, and the host of the duplicate is a giant social network that you don’t own anyway. I can’t think of a downside. Go for it, Patrick!

  15. Thank you Neil, I’ll share this with our followers, they all run company-blogs and will surely find this useful (and reassuring).

    I also just had a guest post on TNW and was wondering if I should republish on my own blog. On this matter, I like the most you’re personal take of keeping the personal blog strictly original. Thank you again!

    • Daria, glad I could help. Thanks for providing us with feedback :)

    • Andy Crestodina Jul 14, 2014 at 5:44 am

      Hello, Daria.

      I’m glad if you found this post useful! I’m thinking of writing a second post that goes into more detail…

      About republishing the guest post, I would either rewrite it to add more value (for the sake of quality, not the fear of a penalty) or just repost it add add the rel=canonical link pointing back to the original.

      Is that helpful? y/n

      Let me know how it goes!

      Andy

  16. Excellent article. Thanks for that.

  17. I think it’s true duplicated text isn’t too much of an issue, so long as you don’t overdo it. eBay got slammed for it, notoriously, quite recently. They’re so famous though it wasn’t much of an issue. I’m pretty sure there’s a gross overreaction to the whole Penguin/Panda situation. It’s all been born from the lunacy of the early SEO days.

  18. Avoiding penalties is an important factor :-) and there’s definitely a lot of hype out there.
    While its debatable how much duplication you can get away with, the goal is to leverage as much out of your content as possible – so it goes without saying that avoiding duplicate content can only be good for business.
    Its not about how much you can get away with, its about how much you can get out of your efforts.

  19. THANK YOU for this– I am seeing my blogging colleagues waste so much time and energy right now “hunting down,” every last scraper and teeny bit of duplicate content, I can’t help but shake my head. It’s never bothered me much, mostly because I don’t have time to police it… But wow my mind is even more at ease.

59 comments

Please use your real name and a corresponding social media profile when commenting. Otherwise, your comment may be deleted.

← Previous ArticleNext Article →