- Related Stories
-
Pay-for-blogging site raises questions
July 11, 2007 -
The big Digg rig
December 4, 2006 -
Spim, splog on the rise
July 6, 2006 -
Tempted by blogs, spam becomes 'splog'
October 20, 2005 - Related Blogs
-
Spam, spam, spam and blogs
October 18, 2005
(continued from previous page)
While most publishers of scraper sites stay underground, Michael Gray, a search optimization consultant who runs GrayWolf's SEO Blog, outed himself as a Web scraper in a blog post about a year ago.
"I've moved away from this. It wasn't worth the time and effort of doing it," he said in a recent interview. He said he aggregated "snippets" of others' content so he could flesh out his sites and make money off Google ads.
Gray also downplayed the significance of scraping. "Bloggers have a tendency to overreact to things and make mountains out of molehills," he said.
Gray said his sites fell under the "fair use" provision of the DMCA, which allows people a limited use of a copyrighted work without having to get permission. But the nature of the use should be noncommercial, said Dennis Kennedy, an information technology lawyer knowledgeable of intellectual-property issues.
"It's extremely difficult to track down the people doing this. And even then, you're probably not going to be able to establish jurisdiction, if they are outside the U.S.," he said. "It could be more expensive than it's worth, and you have to show damages."
Pretty much any site that puts out an RSS feed is going to get scraped, said Jonathan Bailey, Webmaster of Plagiarism Today. Typically, it's the same people sending out the herbal Viagra junk e-mail, he said.
"The black-hat SEOs (search engine optimizers) are doing this to build up Google juice (improve search engine rankings) or display Google AdSense ads," Bailey said.
Not only do scraper bots allow people to grab thousands of posts an hour, but there is software that can give it a pseudonym by replacing certain words with synonyms, such as "feline" instead of "cat," Bailey said. This makes it harder for bloggers to track their scraped content.
The scraped site can even appear on Technorati before the original content, he said. And in some cases, images are getting scraped and "hotlinked" back to the original site, thus depriving that site of bandwidth and costing them money, he added.
Some people point the finger at Google. "They've been slow to shutter a lot of these accounts. It's in their best interest to keep them open for as long as they can, Bailey said.
"Google should do something about this," said Footnoted's Leder. "The entire revenue model for these sites is based on Google ads misdirecting content."
But Google has worked to cut back on the problem of Web spam over the last year, said Matt Cutts, a senior software engineer at Google.
"It's true: people can scrape very easily. But it's also much harder to spam than it has been in the past," he said. "For months and months, we've kicked people out of AdSense because they violated our quality guidelines."
Sites being scraped can report it to Google using the tools section on Google's Webmaster Central site and by clicking on an "Ads by Google" ad, Cutts said.
For sites that syndicate their content through feeds, adding a link to the original source of the article at the top or the bottom of a page with wording to the effect of "this article was originally printed here" will help ensure that Google's search engine displays the original item, not a reproduction, on a scraped site, he said.
Not every blogger is worried about scraper sites. Om Malik, executive editor of GigaOM, a blog that analyzes Net access and telecommunications services, said he doesn't waste time going after scrapers. Why not? "There are so many of those sites. Like (the Lernaean) Hydra's head, kill one, and more pop up."
See more CNET content tagged:
DMCA, WordPress, copyright law, blog, blogging






Splogs are created to promote and increase search engine ranking of affiliated web sites, and/or to make money from ads shown on the splog. Typically splogs are automated, but they can also be manual copy & paste. A recent study indicated that 56% of all blogs are spam, and there are over 575 thousand splogs reported.
http://www.devtopics.com/splogs-spam-blogs-and-stolen-content/
Go ahead and pay for all your advertising....but if you allow for adsense or similar to appear on your site or blog...then you are just as guilty from profiteering as anyone who references your stuff. But at least it is your stuff!
Let's consider outsourcing for a moment. What is that? It's theft of a career - theft of a living.
Take note Lorelle VanFossen:
There is an economic war on the American people - and you are not exempt. You'll find no sympathy from [American] programmers, engineers, and soon to be - doctors, nurses, teachers and any other professional whose job can be done cheaper in India or China - or that they can do when imported here on visa. (GATS - Free trade in human commodities).
Some advice: Learn to like cleaning toilets and mowing laws because that is the future for the American worker.... Oh wait...NOPE, Sorry. Those are the jobs that Americans won't do. We have illegals from Mexico for those jobs.
I guess you'll just have to punt.
I got an email about 10 years ago from a publisher who asked permission whether they could link to my site because direct linking was the issue of the day. My response was "of course... and thank you!"
There's little question that outright plagiarism is a problem on some sites (complete copy of content with no credit to the original content producers) and wholly illegal.
Lorelle VanFossen seems to take an extreme and counterproductive view of her precious words where any sentence fragment including a link back to her original source is akin to theft (as opposed to advertising). This makes her an ego-centric nut who totally misses the point of the web.
I depend on the *summaries* I get from many blogs I visit. If I'm interested in reading the full article, I click on the original content link. The biggest issue I'm finding is that the "source" link is usually another blog with another summary. Sometimes these types of links can go for three or four levels before I find the original source. I think this is a far bigger issue.
gotten vocal about it, and once again you've ripped off Weblogs, Inc.'s Download
Squad:
http://www.downloadsquad.com/2007/07/31/blog-pirates-on-the-horizon/
Please start citing sources and giving credit where credit is due. We link and cite you
guys constantly - the least you can do is give back to the community that is
supporting you.
Complete verbatim copying should be avoided everywhere. think for yourself
http://brain.com
Major non-fiction authors may reference scores of government documents, FOIA requests, newspaper articles, other authors' books and not every snippet of information might be cited. Is this stealing? Sometimes pushing Intellectual Property rights can go too far.
I've looked at Lorelle's site and, well, I'm not quite sure what her business model is. OK, she advertises her own blogging book, probably a print-on-demand item, for which she receives payment.
So, other than ego or hubris that someone 'could steal her words without attribution' what actual damage has she suffered. It's not as though her site is subscription only.
As a successful published writer, I prefer primary source material. I'm quite sure if I encountered a scraper site with some portion of her precious words, I could quickly find her site, even if there was no attribution.
I also know I've been ripped off in my time by editors and publishers, at a cost of time or money or both. I once provided a group of photographs to an editor on request for one project. Later, I learned they'd been used to illustrate someone else's article in that magazine.
Since I was never asked, my letter to said editor was less than cordial. Oh, 'he intended to tell me but was so busy ... ' and also forgot to pay. In the end, I got a check for what it'd cost to produce 'the 8x10 glossy prints with a paragraph on the back' [will Arlo Guthrie sue me for using those words? ha!]
No surprise I never sold another article to that magazine again.
There is no 'right not to be offended' but if plagiarism costs you money, or time, you do have a right to whine about it. Otherwise, ****!
And the lawyerspeak of this statement: "the nature of the use should be noncommercial" re 'fair use' is just silly. Has this bozo never realized reviews of books and movies in print media by professionals are typically paid for?
Ah, America, the land of litigation. Shakespeare was right.
Of course is all the article is reproduced then is a problem but I do not see this behaivor very often.
The reason why i do not support the opinions and actions of Lorelle about this matter is because I realy do not want to see a general copyright mania on the web where everybody is suing everbody.Let's not forget that the web is bassically about the informations and free access to it.
Actually if you do not want people to take and use
your content ,put a login box and restrict the access only to the registred users.Then you will have control.
I find it very ironic that a blogger that has RSS feeds is upset that users are agregating it's content.
- I don't steal this web content
- by amko_sa August 17, 2007 11:50 AM PDT
- Maybe your text is steal from other peapole or idea, site.Every word is copyright.Some people respect other news and copy only part of that news with source link.That is ok for me.Everything on the internet is copyright.Most people on the internet use other source for your own story.We are copyright.Sue all world for copyright.If your story wery important you can sell it on marcet place.For me internet is free place for all people.If you sue some websites because part of your text is in that websites and maybe I am here with the help of that sites your story is not important for me.That site your story consider as importent and interesting.
- Like this Reply to this comment
-
(18 Comments)Sory about my english