• On BNET: 3 worst things about the iPhone 3G S
May 24, 2007 12:02 PM PDT

ReCaptcha: The smartest way to deal with something annoying

by Josh Lowensohn

Spam, zombie robots, and the rest of the dark underbelly of the Internet has led to one of the Web's big annoyances: the captcha. That's the barely readable block of random letters you must translate in order to prove your humanness, and it's supposedly the one thing that separates us from the machines. It's also used in nearly every site registration process--and more recently at site logins. The bottom line is that it's annoying but also utterly necessary to keep evil at bay.

Enter reCAPTCHA, a project of the School of Computer Science at Carnegie Mellon University. A mix between disease-curing Folding@Home, and MyCroft [review], reCAPTCHA requires users to solve two jumbled words: one is the actual captcha, the other is just a word that needs to be translated into text. These words come from various scanned books and documents residing on the Internet Archive. Many of those books were written before computers and in their current state (PDFs and image files) are just glorified photographs--a medium that is still hard to sort through. Once complete, they'll be digital text, and completely searchable.

Words for translation are not just chosen by random. Documents that have been scanned, get checked by an Optical Character Recognition (OCR) engine, which is able to pick up many of the words. Those that are misspelled by OCR, or are impossible to read, are plucked and put into the ReCaptcha word pool. Sites can implement ReCaptcha several ways. There are plug-ins for WordPress, MediaWiki, phpBB, and PHP.

I've embedded a sample ReCaptcha below. You'll notice both words look similar, as ReCaptcha is using both words from the same source, so you can't tell which one has already been solved.

Related: inChorus [review]

[found on del.icio.us]

Josh Lowensohn is an associate editor for Webware.com, CNET's blog about cool and otherwise useful Web applications and services. If you've found a site you'd like profiled, shoot him an e-mail. E-mail Josh.
Recent posts from Webware
Firefox 3.5 and the potential of Web typography
Sites that help you lodge complaints
Google App Engine misfires
Microsoft: Bing needs to improve when news breaks
Google finally sued by makers of Finally Fast
Google Toolbar for IE speaks your language
Bing brings out the tweets
Google Search optimized for a mess of phones
Add a Comment (Log in or register)
They're not first!
by hadaso May 28, 2007 1:38 AM PDT
spammers have already been using human efforts to get past captcha "barriers" to accomplish other useful results for them: namely to allow them to get past these "barriers and gain access to other systems they can abuse. They display the captcha they need to get past to their own users dying to see the young female human they're watching covered by less clithes and feed back the result to whereever the captcha image was displyed in the first place, thereby gaining access to useful resources such as multiple email accounts that can be used to send millions of junk emails to people who are not going to bother reading them (but still have to "just hit delete" a billion times).
Reply to this comment
Hopefully, there's multiple cross checking!
by TMB333 May 29, 2007 2:53 PM PDT
I tried a few of the sample reCaptchas, and after a few tries just ended up trying to guess the word that was the correct Captcha, and 'maliciously' putting in some false word for the other. Normally, I'm a 'law abiding' citizen that tries to do good where I can, however, just like anyone else, occasionally I'll just want to let loose and may end up being bad just for the sake of being bad.

Understandably that this is a good idea to help digitize old archives, but I'm assuming (or hoping) these developers of reCaptcha have taken into account that others out there may also be mischievous at times and maliciously type in incorrect words purposely. If they haven't already thought of it, then they should be taking at least a good multiple of 'like' answers from different users for each word, before they accept the translation as truth.
Reply to this comment
advertisement

About Webware

Say No to boxed software! The future of applications is online delivery and access. Software is passé. Webware is the new way to get things done.

Add this feed to your online news reader

Webware topics

Making sense of Windows 7 upgrades

faq The basics and the fine print on Microsoft's options for those eyeing the next operating system from Redmond.
• Full Windows 7 coverage

Road Trip 2009: Big Sky Country

CNET News reporter Daniel Terdiman takes his car full of gadgets to the Rockies and the Great Plains in search of tech, science, nature, and more.
• America's Fortress: Cheyenne Mountain

advertisement

Inside CNET News

Scroll Left Scroll Right