October 26, 2005 1:06 PM PDT

An open-source rival to Google's book project

(continued from previous page)

also purchase bound copies from Lulu.com for $8 each. The service even lets people create their own book covers and art, and then have the books printed with them. Users can search inside the works and see tabs on pages where the terms occurred. With the move of a cursor, visitors can see which page they will turn to before clicking on it.

Volunteers from LibriVox, an open-source effort trying to make books freely available in audio, have also made audio recordings of the books so that people can listen to them via the Open Library Web site.

In addition, the Internet Archive started "bookmobile" tours around the country to promote on-demand printing of the books. It has vans equipped with printers, binders and computers so that it can print books on demand for children across the country.

How it works
While Google has released few details of its scanning project (the search company has nondisclosure agreements with its library partners), the Internet Archive had a display of its technology at the Tuesday night event.

The Internet Archive built a specialized scanning machine and written open-source software called Scribe for the specific purpose of digitizing books. The "machine" is an assembly of a standard PC with the Scribe software installed, two Cannon EOS cameras, a pedal-operated glass and metal stand to hold and secure books at an angle, along with a table and chair. The machine looks much like a photo or voting booth, with black cloth covering a box frame and shielding the books and computer gear from ambient light.

The chair seats one person, who operates the computer program and turns book pages by hand. During the scanning process, the book sits at a 90-degree angle under glass, which protects it from the camera light and causes the least amount of damage to its pages, according to the Internet Archive. The operator pushes a pedal under the table to release the book from under the glass, and turns the page before it's ready to take another picture.

Once a picture is taken, both pages of the book appear on a computer screen in their original form. The Scribe software then finds the center of the page and makes adjustments of the picture's angle or ensures that it's cropped properly. It will also clean up any poor coloring and make it uniform.

The operator enters some metadata about the book--its author, title and publication date. And once the book is scanned, it's then saved to the system and catalogued. Scribe takes the metadata from the book and matches it with data from existing card catalogs in order to prevent duplication. The work is then added to the digital record.

It takes roughly one hour to scan two 300-page books. And it costs an estimated 10 cents a page, split among data storage, labor and equipment and administration fees, according to Brewster Kahle, the project's leader. The cost does not take into account libraries' fees for getting the book to the scanners.

Daniel Greenstein of the University of California's archive project said that his group has donated $500,000 to assess the ultimate costs of scanning from the libraries' perspective.

The Internet Archive currently has 10 scanning machines, but it is ramping up to build 10 more in the next year.

"This is one of the great things we've ever done," said Kahle. "It's up there with the Library of Alexandria and putting a man on the moon."

CNET News.com's Elinor Mills contributed to this report.

Previous page
Page 1 | 2

8 comments

Join the conversation!
Add your comment
What about the Gutenberg project?
<a class="jive-link-external" href="http://www.promo.net/pg/list.html" target="_newWindow">http://www.promo.net/pg/list.html</a>
Seems like ANY discussion of this sort of project MUST at least MENTION Project Gutenberg.
THOUSANDS of past-copyright books scanned in an ongoing project that was essentially open-source before open-source had a name.
Why doesn't the article make even a passing mention?
Posted by powerclam (70 comments )
Reply Link Flag
What about the Gutenberg project?
<a class="jive-link-external" href="http://www.promo.net/pg/list.html" target="_newWindow">http://www.promo.net/pg/list.html</a>
Seems like ANY discussion of this sort of project MUST at least MENTION Project Gutenberg.
THOUSANDS of past-copyright books scanned in an ongoing project that was essentially open-source before open-source had a name.
Why doesn't the article make even a passing mention?
Posted by powerclam (70 comments )
Reply Link Flag
Gutenberg, gutenberg, gutenberg
What a shame! You did not do your homework... What about spending 10 minutes looking for e-books? You can get 16.000 books from Gutenberg through P2P, RSS or you can download them to your PDA. You can even get DVD or CD images for the entire catalog (the million dollar DVD). You can check if the book you are transcribing is in the public domain. And this time the volunteer work is, without doubt, better than having a guy flipping pages in a voting booth contraption, because WE READ AND PROOFREAD the books. But no, quality problems are only for Wikipedia articles, I guess. Well, I can understand you: the project has been around only for 34 years... it is not news: for news, you have Google, google, google. Gurgle.
Posted by ciropabon (47 comments )
Reply Link Flag
Gutenberg, gutenberg, gutenberg
What a shame! You did not do your homework... What about spending 10 minutes looking for e-books? You can get 16.000 books from Gutenberg through P2P, RSS or you can download them to your PDA. You can even get DVD or CD images for the entire catalog (the million dollar DVD). You can check if the book you are transcribing is in the public domain. And this time the volunteer work is, without doubt, better than having a guy flipping pages in a voting booth contraption, because WE READ AND PROOFREAD the books. But no, quality problems are only for Wikipedia articles, I guess. Well, I can understand you: the project has been around only for 34 years... it is not news: for news, you have Google, google, google. Gurgle.
Posted by ciropabon (47 comments )
Reply Link Flag
gutenberg.org
I agree with the previous comments.
It's very cheap reporting to not even MENTION Project Gutenberg as the grand daddy of all these new book-digitizing projects that are just warming up the scanners.
There's also SunSite, onlinebooks.library.upenn.edu and many others.
And then there's the coolest project of all:
UNILIBRARY (com/net/org)
Posted by spytrdr (13 comments )
Reply Link Flag
gutenberg.org
I agree with the previous comments.
It's very cheap reporting to not even MENTION Project Gutenberg as the grand daddy of all these new book-digitizing projects that are just warming up the scanners.
There's also SunSite, onlinebooks.library.upenn.edu and many others.
And then there's the coolest project of all:
UNILIBRARY (com/net/org)
Posted by spytrdr (13 comments )
Reply Link Flag
Bookmobile connectivity
Yes, the bookmobile is driving proof that universal access is possible today. But there is a problem. And its name is Internet connection.
<a class="jive-link-external" href="http://www.highspeedsat.com/bookmobilesinstalls.htm" target="_newWindow">http://www.highspeedsat.com/bookmobilesinstalls.htm</a>
Posted by finlandforum (10 comments )
Reply Link Flag
Bookmobile connectivity
Yes, the bookmobile is driving proof that universal access is possible today. But there is a problem. And its name is Internet connection.
<a class="jive-link-external" href="http://www.highspeedsat.com/bookmobilesinstalls.htm" target="_newWindow">http://www.highspeedsat.com/bookmobilesinstalls.htm</a>
Posted by finlandforum (10 comments )
Reply Link Flag
 

Join the conversation

Add your comment

The posting of advertisements, profanity, or personal attacks is prohibited. Click here to review our Terms of Use.

What's Hot

Discussions

Shared

RSS Feeds

Add headlines from CNET News to your homepage or feedreader.