August 2, 2007 4:00 AM PDT
Please don't steal this Web content
"It's one of my favorite subjects," she said. "I make my living from my writing, and when people take it because they are ignorant of copyright laws--or think that because it's on the Internet, it's free--it makes me really mad. It's stealing content, in my mind."
VanFossen isn't referring to the kind of plagiarism in which a lazy college student copies sections of a book or another paper. This is automated digital plagiarism in which software bots can copy thousands of blog posts per hour and publish them verbatim onto Web sites on which contextual ads next to them can generate money for the site owner.
Such Web sites are known among Web publishers as "scraper sites" because they effectively scrape the content off blogs, usually through RSS (Really Simple Syndication) and other feeds on which those blogs are sent.
VanFossen's Lorelle on WordPress blog is an authority on the Internet for blogging dos and don'ts. One of the no-nos is using content from other sites without getting permission.
VanFossen has several ways of checking to see if other sites have scraped her posts. She puts full links in her posts to other articles of hers so that when one of her stories is posted on another Web site, it will link back to her story, and she can see the Trackback. Trackback is a "linkback" method Web publishers use to identify who is linking to or referring to their articles.
She has set up Google Alerts with her byline so that she will get notifications any time Google comes across a news site or blog with a reference to her. She also does a keyword search for her name on Google search, Google Blog Search and Technorati. In addition, she uses a WordPress plug-in that allows her to insert a digital fingerprint, a series of unrelated words, into her posts that she can search on in case her byline is stripped.
Invariably, VanFossen comes across her posts on other sites.
If she hasn't had a previous problem with a site, she will send the site publisher an e-mail asking them to not use her content without her permission. If she doesn't get a response, or she has had problems with the site in the past, she sends a "cease and desist" letter that informs the owners that they are violating her copyright and warns them she will take legal action under the Digital Millennium Copyright Act, or DMCA, unless they remove her content.
VanFossen also contacts the company that hosts the Web site, as well as advertisers on that site and search engines, providing the necessary evidence via mail or fax, as required. "The DMCA puts the onus on advertisers, Web hosts and search engines to remove copyright violations," she said. "I have a form letter I use."
In December, Michelle Leder, editor of Footnoted.org, used a cease-and-desist order to get her content taken off a site that was continuously republishing her posts. "Even the post I wrote about him stealing my content was posted on his site," she said with a laugh.
"It wasn't the issue of money," Leder added. "When other people's business model is based on stealing content, that's a significant problem."
One site that offers a free service for tracking copyrighted content online is CopyScape. About 200,000 Web site owners use the free service every month, and thousands pay for a higher-level service, said Gideon Greenspan, chief technology officer of Indigo Stream Technologies, which offers the service.
There are many aggregator Web sites that collect content from a variety of sources, often related to a specific topic area, like real estate or cars, around which they can serve contextual ads. While some of the sites reproduce entire blog posts or articles from other sites (CNET News.com included), others offer just headlines or the first paragraph or a few paragraphs. Many include attribution and a link back to the original article. But providing attribution does not preclude a copyright violation, experts say.
18 commentsJoin the conversation! Add your comment