Web site owners might be amazed to learn that one of the biggest sources for duplicate content isn't externally, but rather internally.
Certainly, popular sites and blogs that syndicate a lot of content have to deal with external duplication, but as I already touched on external duplicate content, we know that there are steps to minimize those challenges and to establish your site as the canonical source.
Internal, or on-site, content duplication tends to come in a few key ways, the first of which is within the key page elements. The second is from the content itself; similar to e-commerce sites using stock product copy, you may be using your own copy over and over again on your site. Third, it simply may come from too little differentiated copy.
... Read moreWhen it comes to Internet retailers, getting found in search results is often just as important as the right location is to brick-and-mortar retailers. When a big part of online success comes down to words, why settle for selling what everyone else is?
All retailers, no matter what their channel of choice, often sell the same products as at least some of their competitors. If you are a big enough fish, you can command enough power to at least obfuscate that fact . . . different product names, model numbers, etc. -- of course the underlying product is often still the same, anyway. Ever wonder how some retailers offer those huge pricing guarantees if you find the same product elsewhere at a lower price -- much easier to do when you have your own guarantee with the manufacturer that no one else can carry that same model.
But online retail is a bit more challenging, because aside from brand loyalty or being at a convenient location, the difference is often about search results . . . obtaining those highly coveted top rankings for the right searches. I began our duplicate content discussion by focusing on the duplicate content filter or penalty topic and the challenges of external content duplication. What better way to bridge the gap from external to internal, or on-site content duplication, than by talking about sales copy.
... Read moreAre you being outranked by you? Is "your" content showing up in searches, but on sites that aren't yours? Do you have multiple websites that compete against each other? Well this discussion on duplicate content from external sources should be right up your alley.
Earlier in the week, I started our discussion on duplicate content by trying to lay to rest the idea of a duplicate content penalty. Now we pick up that discussion with one aspect of duplicate content . . . content duplication from other sites.
While I'd love to start out our discussion with the idea that external duplicate content is the hardest to deal with, that may not always be the case as you'll see when we talk about duplication on our own websites. For now though, we are just going to focus on content duplication from other sites.
At this point, you are probably in one of two camps--the "Yes, help me with this please," camp or the "What in the world are you talking about?" camp. So let's start by getting everyone in the same camp at least. External content duplication can come about, generally, in three ways.
Content Theft
In every aspect of life, there are those who want to get ahead through the hard work of others, even illegally or unethically. The Web is certainly no exception to this, especially given the fact that, of all the ways to take advantage of the hard efforts of others, copy-paste must certainly be the laziest--I mean easiest.
Don't feel that this is an issue that only affects big name brands and sites, because anyone who publishes online is susceptible to this kind of attack. Keep in mind that what we are talking about here is essentially copyright infringement, not phishing sites and things like that, which is a whole other level of criminal activity.
Realistically, this is probably the hardest to combat, but in many cases, probably doesn't cause as much damage as you might think. In many ways, we might thank the search engines for this. They're out to deliver the best results they can to searchers and are certainly aware of these issues. Because of this, I truly believe they work really hard to identify authoritative and original sources of content. They can compare content they find based on when they found it, as well as links leading back to that content, and while purely speculation, I would have to imagine that it would be pretty easy for the engines to assign a score to any site based on the proportion of content on the site that appears elsewhere and determine natural and unnatural patterns.
So what can you do about content theft? While you can file reports with the search engines based on the Digital Millennium Copyright Act (just search on "Google copyright infringement" or the respective search engine for specific details), the ISP that hosts the infringing domain, or seek even greater legal action, it may be better to first weigh the impact you feel it really has as well as the resources it may take to fight it and determine whether it is worth your attention to begin with. And sometimes, just an email or letter to the infringer might be enough
Content Syndication
Ironically, you are probably the most responsible for your own duplicate content on other sites. Writing content and syndicating through article directories or other content syndication services, RSS feeds of blog posts, and press release syndication will probably make up far more of your duplication woes than pirated content.
Each of these instances can be addressed though. Article writing and similar content is best kept unique and different from any content you have on your own site. When it comes to this kind of content, it is often best to develop content for the sites where it is going to be placed anyway, rather than a mass distribution. Of course, you'll also want to include a byline with a link back to your site.
Blog syndication can be handled a little differently. You may decide to include only a summary of your post, or the full post. The pros and cons here must be weighed, since a partial feed may discourage some sites from even syndicating your blog. In many cases, there may be enough differentiation between your blog and the sites where your post is syndicated anyway. However the best solution is to also include an absolute link back to the blog post on your own site. This helps signal to the search engines that your post is the source.
Press releases can be handled the same way as these other content pieces. Whether you are distributing through wire services or using RSS to syndicate from your site, including links back to your site helps signal the source. Press releases also tend to be more temporary on external sites, though you should certainly keep an archive on your own site.
Micro-Sites
The final source of external content also falls under your control. Micro-site strategy consists of creating additional websites, often around niche topical areas. This strategy evolved out of the idea that if one website was good, then many websites must be better, and would increase the chances of ranking in search engines and the number of listings for a particular search. Some view micro-sites as a good thing, while others view them as bad, however neither view is particularly accurate. Rather, it is the implementation that makes them good or bad.
Micro-site strategy is a much bigger topic, but bad implementation is directly related to our discussion of duplicate content. Most micro-site implementations result in identical or nearly identical duplication of the main website's pages on the various micro-sites. This isn't surprising since creating unique content for one site, especially for an ecommerce site, is often challenging enough without having to create unique content for multiple sites. But rather than improving or increasing rankings, the micro-sites tend to directly compete with the main site and greater resources are needed to maintain multiple sites. Needless to say, this is why most micro-site implementations are bad.
Like many things, there are a few tools that can be used in the fight against duplicate content. One tool to help you keep on top of potential content theft issues is Copyscape, that allows you to enter in your page and it comes back with a list of potential duplication.
Several weeks ago at SMX West I had the pleasure of meeting and having lunch with Brian White from Google. White works on Matt Cutts' Web spam team, tirelessly working to make Google's search results the best they can be, ensuring the best user experience. Quite a hefty task indeed.
You'd think that someone who spends his days fighting the never-ending battle that is Web spam might be a bit negative or jaded. If that is the case, he does an amazing job hiding it. Instead, he was upbeat and you could feel the excitement in his voice as he spoke. Here's a guy who loves what he's doing and truly wants to not only improve the searchers' experience on Google, but wants to make the Web a better place. You can't help but like a guy who's fighting the good fight.
... Read morePrinting Web pages can often be an exercise in frustration. It's amazing how the most important information often gets cut off along the right side of the page.
Web designers and makers of content management systems (CMS) have tried to ease that pain by creating printer-friendly versions of pages to make sure that site visitors get the goods.
Unfortunately, printer-friendly doesn't always equate to search engine-friendly. These printer-friendly pages often result in creating duplicate content, possibly even a complete duplication of the entire Web site. Web site owners have been relieved to learn that duplicate content isn't seen as a penalty by search engines; rather, it results in a filter to help them identify which page they feel is most correct to return in search results. But that doesn't mean that this content duplication doesn't carry a negative impact.
And this is one of those subtle areas, in which good design and SEO best practices intersect. If these printer pages are created through entirely separate pages or appended URLs, they can dilute a site's PageRank as well as diminish crawl equity from the spiders crawling duplicate pages. You can often spot these by looking for a link on the page that says something like "printer friendly" or "print this."
For example, let's say that you have a Web site that has 1,000 pages, a small to moderate-size site, depending on your perspective. Now, because you've taken advantage of your CMS' ability to automatically create a "print this" link on each page to a printer-friendly version, for all practical purposes, your site just doubled to 2,000 pages. But what if your PageRank isn't high enough to warrant very rapid spidering? It could take a lot longer for all your pages to get indexed.
Some of your "good" pages may not get indexed, where they would have otherwise, or they may end up in Google's supplemental index instead of the main index. Not to mention the wasted bandwidth of crawling these duplicate pages. What if your site instead has 10,000 or 100,000 pages? As you can see, there is more at stake here than just duplicate content being filtered out.
Printer-friendly pages present less of an issue on dynamic Web sites, where the pages are created from a database using the same content as the regular pages, but this can be an even bigger issue on sites where these are actually two separate pages that each need to be maintained. It doesn't take long for these pages to get out of sync.
By no means should you run out and remove the printer-friendly functionality on your site, because this is arguably a valuable feature for your visitors. There are, however, alternatives that can be explored.
One method is to use JavaScript-based links to these pages, which search engine spiders aren't able to follow. However, this may present issues to anyone who has chosen to turn off JavaScript in their browser, though this will probably be a small number of users, anyway.
A better method is to utilize CSS (Cascading Style Sheets) to create a separate printer style sheet. The added benefit to this is that you get to remove the extra link from your pages. When visitors choose to print one of your pages, the browser builds that page based on the printer style sheet rather than the one used for onscreen viewing. Visitors can even preview a page to see how it will look printed.
While there are still challenges to printer style sheets, designers with CSS experience should be able to create one for most sites. Implementing this method will mean that you don't have to worry about duplicate content issues, appended URLs, or any other issues created by having separate URLs or pages for your printer-friendly pages. Your regular pages are also your printer-friendly pages; it's no longer about URLs or pages, but rather presentation.
- prev
- 1
- next





