July 13, 2006 1:19 PM PDT

Microsoft looks to foil Web spammers

Researchers at Microsoft have developed a tool to scrub search engines of major Web sites that pollute search results and ultimately help clean the Web of spam.

Called Strider Search Defender, the tool is designed to dig out Web pages that are a front for spam Web sites, according to a paper published by Microsoft researchers on Thursday. These Web pages typically reside on blogging sites and other services that provide free Web space, the researchers said.

Spammers soil the Web with countless links to their spam fronts in order to gain a higher ranking in search engines. "By cleaning up Web search, hopefully we can discourage spammers from cluttering the Web with spam," Yi-Min Wang, principal researcher at Microsoft, said in an interview.

Microsoft's tool doesn't find spam the traditional way, by looking at the site's content. Instead, it turns the spammers' activities against them by using search engines to find links to potential spam pages. These links are often posted as comments on blogs, in online discussion forums and in guestbooks, also called "comment spam."

Search Defender starts with a list of confirmed spam Web addresses. A "Spam Hunter" part of the tool runs those addresses through search engines to find pages that link to the spam sites, using the "link:" query tag. Additional spam URLs found on those sites are, in turn, run through the Spam Hunter, resulting in a long list of potential spam sites.

Then, using another Microsoft research project, Strider URL Tracer, false positives are filtered out and a list of Web pages that redirect to spam sites is compiled. Strider URL Tracer actually visits each one of the Web addresses found by the Spam Hunter to see if it redirects to a secondary spam page.

"We use search engines to find them," Wang said. "Spammers are basically telling us: Here are my spam URLs."

Spammers use various online services to host spam fronts, including free Web hosting providers such as Tripod, Angelfire and Yahoo's Geocities, Microsoft said. Blogging services are also often abused, Google's Blogger at blogspot.com is especially popular, according to the research report.

"Our preliminary investigation shows that spam blogs hosted on blogspot.com appear to be particularly widely spammed and effective against search engines," the report said.

Microsoft's researchers are working with the MSN Search team to see how the search service could be cleaned up, Wang said. Additionally, he called on the Web community, especially the operators of blog and free hosting sites, to cooperate to combat the Web spam problem.

"In the end it is all about protecting the search engines. Because if the spam doesn't show up in any search engine result the spammer will not receive traffic," Wang said.

See more CNET content tagged:
spammer, spam, search engine, researcher, blogging

Add a Comment (Log in or register) 5 comments
Tech Report
by searchinginseattle July 13, 2006 2:04 PM PDT
The report that you said was released by MS today
(MSR-TR-2006-97) is not new, just updated. I was first placed on the web several months ago, early April. It was discussed and link to on several blogs like this one. See the SEE ALSO link.
http://www.resourceshelf.com/2006/04/11/new-from-microsoft-research-strider-url-tracer-with-typo-patrol/
Reply to this comment
EXCELENT> when I see 34,435,678 results I get discouraged
by qazwiz July 14, 2006 7:02 AM PDT
I sometimes want to see results of searches in other engines so I am also concerned if they might be deleting something I want to see. (like the Google lawsuit)

but this sounds like they will clean up the garbage.
Like if I search for dogs a site might say "dogs, poodles, dashunds, collies... with ads that say: if you want to see my ***** pay $29.95 per month"

a very short synopsis of the crass things I've seen anything that will clean it up will be welcome but what about the false positives? what about those sites that accidentlly get in and don't get filtered out? I think it is a very possible situation for a lawsuit that should not be thrown out.
Reply to this comment
Good Luck Google getting this tech. All for MSN
by BMR777 July 14, 2006 2:36 PM PDT
Google's probably never going to get it's hands on this tech. M$ will keep it to try and boost hits on it's lame MSN search.

BMR777
http://www.webringamerica.com
Reply to this comment
Yes, read on Dotso.com
by JoeCrow July 14, 2006 3:44 PM PDT
This article isn't new. It was linked on Dotso.com not long ago ...
Reply to this comment
Reverse Google Adsense ID Lookup
by tagtooga July 14, 2006 9:16 PM PDT
Web spammers SPAM for 1 purpose: to make advertising revenue. The majority of that is via Google Adsense. TagTooga.com collects adsense IDs and provides the reverse lookup both interactively and programmatically so you can see all the sites showing a particular Adsense ID. Of course, once TagTooga bans the spammer from it's site, that information is no longer available -- TagTooga.com uses the reverse lookup to automatically delete all submissions/domains/URLs with the Google ID, and to prevent future submissions using banned IDs. (see http://geekswithblogs.net/chilkat/archive/2006/03/16/72540.aspx and http://www.tagtooga.com/pg/DownloadSites for more information...)
Reply to this comment
Powered by Jive Software
advertisement

Latest tech news headlines

RSS Feeds

Add headlines from CNET News to your homepage or feedreader.

More feeds available in our RSS feed index.

advertisement

Inside CNET News

Scroll Left Scroll Right
  • News - Business Tech

    Chrome's JavaScript challenge to Silverlight

    The advent of Google's Chrome browser, software pros say, should spur a big speedup for JavaScript, which would raise its standing against Microsoft's Silverlight technology.

  • Gallery

    Photos: Top 10 reviews of the week

    Here are CNET Reviews' 10 favorite items from the past week, including the TiVo HD XL, Sony Cyber-shot DSC-H50, and the Dish Network's newest digital TV converter box.

  • News - Apple

    Apple watchers spot 'iPod Nano' pix, iTunes hints

    The rumor mill has long been predicting a longer, leaner new version of the iPod Nano, and now it's conjuring up some pictures.

  • Outside the Lines

    EIC Squared: Chrome, iPods, and a Dell-Salesforce union

    On this week's EIC Squared podcast CNET's Dan Farber and ZDNet's Larry Dignan discuss Google's latest rocket launch--the Chrome browser--as well as Apple's iPod event next week and a Dell-Salesforce.com union.

  • Video

    Katie Couric reflects on first Webcast

    The political conventions are over and so are CBS Evening News anchor Katie Couric's first series of Webcasts. CNET's Kara Tsuboi sat down with Couric on the final night of the Republican National Convention to discuss what she liked about Webcasting, some of her most memorable guests, and whether TV news will still be around by the next round of conventions.

  • News - Digital Media

    At 10 years old, whither Google?

    Daniel Sieberg of CBS News looks at how the company grew exponentially from start-up to superstar and part of our culture, but what's ahead?

  • Video

    YouTube plays party politics

    During the presidential campaigning four years ago, YouTube didn't even exist. Now it's a tool candidates must master to get their message across. CNET's Kara Tsuboi stops by the YouTube upload booths at the Democratic and Republican conventions to find out why Google's video site has such a big presence in Denver and St. Paul, Minn.

  • News - Gaming and Culture

    Are Demo and TechCrunch50 fragmenting their audiences?

    With both events scheduled to start Monday, many press, as well as venture capitalists and others are having to choose which one to attend.

  • News - Cutting Edge

    Execs predict next Google-like tech

    On eve of company's 10-year anniversary, researchers and business pundits speculate about what technologies might someday have as much impact as Google.

  • Gallery

    Images: The art of 'Spore' prototypes

    Will Wright and his Maxis team worked on dozens of prototypes to test the elements of their soon-to-be-released evolution game. Here's a sampling.

  • Webware

    Mozilla releases second Firefox 3.1 alpha

    Added features include support for a new video tag element introduced with the HTML 5 standard, along with some speed enhancements.

  • Green Tech

    Duke Energy to invest in mini solar power plants

    Can hundreds of rooftop solar panels collectively operate like a central power plant? Duke Energy launches $100 million distributed solar program to find out.