ZURICH, Switzerland--Chances are that if you've solved one of those distorted-word tests to secure an account with Facebook, Craigslist, or Ticketmaster, you've helped The New York Times inch a little closer to digitizing its entire print newspaper archive from 1851 to 1980.
How have you unwittingly helped the Gray Lady by wasting 10 seconds on a computer-generated word challenge? It's thanks to a year-old initiative called ReCaptcha, a play on the antispam tests known as Captchas (Completely Automated Public Turing Test To Tell Computers and Humans Apart), a test that people can pass, but machines cannot.
People typically fill out Captchas so Web sites can verify that a human, rather than a spam bot, is behind the request for a new e-mail address, log-in, or membership. But with ReCaptchas, which are double-word tests, humans are also helping machines better recognize faded-ink or blurry words that have been digitally scanned from old newspapers or books--text that's difficult for a computer to recognize optically. That way, people will eventually be able to sift through print archives with a more intelligent search engine.
In the last year, as many as 600 million people have completed at least one ReCaptcha on sites such as Twitter, LastFM, and Ticketmaster, which use the technology for free, according to ReCaptcha creator and Carnegie Mellon University assistant professor Luis von Ahn.
With all those helping hands, von Ahn expects that The New York Times digitization project will be finished by the end of 2009, at the latest. (About five months ago, The New York Times paid an undisclosed sum to von Ahn's CMU team to complete its project.)
"We're reusing wasted human cycles," von Ahn, 28, said while speaking at a robotics conference here recently.
The venture involves putting millions of eyes on words printed in roughly 47,000 newspapers, with various counts of pages. For example, before the turn of the century, The New York Times was about one-fourth the breadth it is today. It's doubled in size about every 50 years or so since its beginning in the 1850s, when it was published every day except Sunday. (The New York Times did not immediately respond to a request for comment for this story.)
Von Ahn's team is also helping the Internet Archive with the digitization of books through ReCaptcha, but it's doing that project gratis.
… Read more