Duplicate Content Penalty, RSS and Splogs
Here’s a topic I’ve wanted to tackle for quite some time and it’s something that comes up quite a bit in the blogging community. Consider the frustration expressed by real estate blogger Jay Thompson –
I grow weary of realestatemindset.info stealing my blog content.
Let’s see how long they take to steal this. Posted at 7:09Pm Friday , Sept 28.
Question: How long did it take them to scrape and steal this post?
Answer: 29 minutes….
First of all, let’s talk about how this is done. Do spam blogs (splogs) have special bots that crawl around our sites? No. As a blog author, I have enabled RSS or really simple syndication. That means followers of this site who use newsreaders can see my posts. It also means sites that generate content dynamically can pull in my content and shape it any way they want. RSS is an automated way to submit your content all over the web. It’s one of the first things I enabled when I started this blog and any blog for that matter.
Why do these sites take content? They post ads that pull from the content posted on their site. This practice is annoying because if you have trackbacks enabled, it can clutter up your comment section. WordPress is now keeping trackbacks separate from comments so it’s not that big of a worry.
What’s a trackback? – When you link to a specific post in a blog, it will generate a brief description of what you wrote as well as a link back to your own site. Instead of linking to Jay’s main page at http://www.phoenixrealestateguy.com, I linked to a specific page which will generate a trackback to me if he has the feature enabled. I’m not sure he does 😉 You can see specific examples of trackbacks if you click on an individual post on this site and then go to the comments. Trackbacks appear in their own tab.
Let’s go back to why Jay was so annoyed. He didn’t see any value in getting that link and he wasn’t properly credited as the author. He’s got a great screen shot of the offending splog. I commented on his post that it didn’t really matter because an inbound link is an inbound link. That’s only partially true. A quality inbound link is one that comes from a site that has Page Rank, is relevant to your site and has been around for a while. Splogs typically don’t have Page Rank and haven’t been around very long, but the way they’re structured is often relevant. For a new blog, I don’t think it’s a bad thing to get these types of links. They send the spiders to your site and they may even send real visitors.
Besides the annoyance of having to turn off or delete trackbacks, the biggest concern bloggers have is these splogs will outrank them on search engines and even reduce their own rankings on the results pages. This duplicate content penalty is an unjustified fear. Let’s take a look. Jay’s post was called, “RealEstateMindSet.Info Steals Blog Content.” Type those words into Google and what do you get? Jay’s site is number one and two. A couple splogs turn up and there’s a result from another search engine. RealEstateMindSet.Info isn’t even around any more.
I know you’re thinking that’s just one example. It could be a fluke! Let’s look at another example. Here’s an article I wrote a few years ago and syndicated using RSS. Here are the search engine results –
Let’s face it…not everybody likes going to school and high school can be a terrible experience for many students. Whether you’re the hands on type who …
Ten Careers For High School Seniors Who Hate School. Let?s face it?not everybody likes going to school and high school can be a terrible experience for many …
How did my optimized page wind up number one out of all these pages that appear to have the exact same content? Remember my last post about the importance of page titles? Look at these SERPs and you’ll see only one result has our keyphrase in the title, description and URL. Two others have similar structures, but they’re not exactly the same. With those differences in descriptions and the metatags we don’t see in the SERPs, it’s obvious the spider’s don’t consider these to be duplicate pages. When you throw in some of the other ranking algorithm features like age, it becomes apparant that a blog post copied by a splog will never be outranked and never be penalized.
As a blog author, you have little control over how your syndicated content is used. A splog may not link back to you. It may not properly credit you as the author. There are two things you can control on your RSS feeds and this is how I deal with splogs.
1. Don’t post your full feed. If you only publish a portion of it, you’ll never have to worry about duplicate content penalties. Secondly, a splog will typically link back to the original source just in case a human visitor shows up.
2. Do link back to your own site somewhere in the post. RSS feeds are verbatim. If you’re not posting your full feed, it’s helpful to include that link towards the beginning of the post because that’s what usually gets excerpted.
Blog authors can feel confident in utilizing RSS feeds to gain readers and search engine rankings because splogs will never out rank the original post. By following the tips in this article, you can maximize the automated power of RSS.