ie8 fix

recaptcha

Google acquires ReCaptcha as book-scanning aid

Google has acquired ReCaptcha, one of those companies behind the distorted text boxes at the bottom of many Web site sign-in pages.

Terms of the deal were not disclosed, but Google plans to use ReCaptcha's technology both as a security measure within certain Google sites and to make its massive book-scanning project a little smarter, the company said in a blog post. ReCaptcha is an offshoot of Carnegie Mellon University's School of Computer Science, and puts a twist on the traditional captcha: a string of letters in squiggly text meant to confuse spam bots and other nonhuman Web … Read more

ReCaptcha: Reusing your 'wasted' time online

ZURICH, Switzerland--Chances are that if you've solved one of those distorted-word tests to secure an account with Facebook, Craigslist, or Ticketmaster, you've helped The New York Times inch a little closer to digitizing its entire print newspaper archive from 1851 to 1980.

How have you unwittingly helped the Gray Lady by wasting 10 seconds on a computer-generated word challenge? It's thanks to a year-old initiative called ReCaptcha, a play on the antispam tests known as Captchas (Completely Automated Public Turing Test To Tell Computers and Humans Apart), a test that people can pass, but machines cannot.

People typically fill out Captchas so Web sites can verify that a human, rather than a spam bot, is behind the request for a new e-mail address, log-in, or membership. But with ReCaptchas, which are double-word tests, humans are also helping machines better recognize faded-ink or blurry words that have been digitally scanned from old newspapers or books--text that's difficult for a computer to recognize optically. That way, people will eventually be able to sift through print archives with a more intelligent search engine.

In the last year, as many as 600 million people have completed at least one ReCaptcha on sites such as Twitter, LastFM, and Ticketmaster, which use the technology for free, according to ReCaptcha creator and Carnegie Mellon University assistant professor Luis von Ahn.

With all those helping hands, von Ahn expects that The New York Times digitization project will be finished by the end of 2009, at the latest. (About five months ago, The New York Times paid an undisclosed sum to von Ahn's CMU team to complete its project.)

"We're reusing wasted human cycles," von Ahn, 28, said while speaking at a robotics conference here recently.

The venture involves putting millions of eyes on words printed in roughly 47,000 newspapers, with various counts of pages. For example, before the turn of the century, The New York Times was about one-fourth the breadth it is today. It's doubled in size about every 50 years or so since its beginning in the 1850s, when it was published every day except Sunday. (The New York Times did not immediately respond to a request for comment for this story.)

Von Ahn's team is also helping the Internet Archive with the digitization of books through ReCaptcha, but it's doing that project gratis.

Read more

Captcha T-shirt from Crusher knows if you're human

I dig goofy T-shirts and this one stole my heart this morning. It comes from Web 2.0 invite service Crusher (review), and emulates the style of a captcha, which are those often times impossible-to-read pictures of warped and stretched words you need to translate to prove your humanity on most Web sites. Unlike real captchas though, solving this one won't help translate old books, or separate your Web identity from that of cold and calculating robots.

Related: Web Shirts: 20 rad T-shirt sites

ReCaptcha: The smartest way to deal with something annoying

Spam, zombie robots, and the rest of the dark underbelly of the Internet has led to one of the Web's big annoyances: the captcha. That's the barely readable block of random letters you must translate in order to prove your humanness, and it's supposedly the one thing that separates us from the machines. It's also used in nearly every site registration process--and more recently at site logins. The bottom line is that it's annoying but also utterly necessary to keep evil at bay.

Enter reCAPTCHA, a project of the School of Computer Science at Carnegie Mellon University. A mix between disease-curing Folding@Home, and MyCroft [review], reCAPTCHA requires users to solve two jumbled words: one is the actual captcha, the other is just a word that needs to be translated into text. These words come from various scanned books and documents residing on the Internet Archive. Many of those books were written before computers and in their current state (PDFs and image files) are just glorified photographs--a medium that is still hard to sort through. Once complete, they'll be digital text, and completely searchable.

Words for translation are not just chosen by random. Documents that have been scanned, get checked by an Optical Character Recognition (OCR) engine, which is able to pick up many of the words. Those that are misspelled by OCR, or are impossible to read, are plucked and put into the ReCaptcha word pool. Sites can implement ReCaptcha several ways. There are plug-ins for WordPress, MediaWiki, phpBB, and PHP.

I've embedded a sample ReCaptcha below. You'll notice both words look similar, as ReCaptcha is using both words from the same source, so you can't tell which one has already been solved.

Related: inChorus [review]

[found on del.icio.us]… Read more