October 2, 2005 9:00 PM PDT

Yahoo to digitize public domain books

Yahoo is launching a library-digitization project to rival Google's controversial program.

Yahoo is working with the Internet Archive, the University of California and others on a project to digitize books in archives around the world and make them searchable through any Web search engine and downloadable for free, the group was set to announce Monday.

"If we get this right so enough people want to participate in droves, we can have an interoperable, circulating library that is not only searchable on Yahoo but other search engines and downloadable on handhelds, even iPods," said Brewster Kahle, founder of the Internet Archive.

The project, to be run by the newly formed Open Content Alliance (OCA), was designed to skirt copyright concerns that have plagued Google's Print Library Project since it was begun last year.

The Authors Guild sued Google last week, alleging its scanning and digitizing of copyright protected books infringes copyright, even if only small excerpts are displayed in search results as Google plans. Google argues that the project adheres to the fair use doctrine under U.S. copyright law, which allows excerpts in book reviews and the like.

Unlike Google, Yahoo will scan and digitize only texts in the public domain, except where the copyright holder has expressly given permission. The OCA project also will make the index of digitized works searchable by any Web search engine. Because Google is restricting public access to excerpts of copyright protected books, it is maintaining control over the searching of all the digitized texts in its program.

The Internet Archive, a nonprofit formed to offer access to historical collections that exist in digital format, will host the digitized material. Hewlett-Packard Labs is providing technology for scanning books, and Adobe Systems is providing software licenses for its Acrobat and Photoshop software.

The University of California system, The University of Toronto, the European Archive, the National Archives in the United Kingdom, O'Reilly Media and Prelinger Archives are all providing content, which will include books, speeches, spoken word audio, video and music, Yahoo said.

The University of California's 10 campus libraries have about 33 million volumes, of which an estimated 15 percent are in the public domain, said Daniel Greenstein, associate vice provost and University Librarian of the California Digital Library.

Greenstein said that contrary to publisher concerns that people will choose not to buy books if they can read or download them free online, the ability to easily find books on the Internet will broaden the public's exposure to them and is likely to increase, not decrease, sales.

"There is good evidence to suggest that if people see (that a book) is (out) there, they will buy it. Print sales either increase or are unchanged," he said. "We haven't once seen data to suggest that open access, at least to published printed works, decreases sales."

The University of California Press is likely to participate in the project, said Lynne Withey, director of the UC Press. "I'm all in favor of extending the availability of both books and journals in digital formats," she said. "So anything that does that in a way that respects authors' copyrights and also allows publishers to stay in business is a good thing."

By exposing more people to scholarly works, the OCA project could contribute to improved research and help reverse the trend among publishers of cutting back the number and print runs of books, said Lawrence Pitts, chairman of the University of California Academic Counsel Special Committee on Scholarly Communication.

Rising prices on books from academic publishers has meant fewer purchases by universities, he said. For example, academic presses that used to print 12,000 copies of a book a few years ago are now printing as few as 250 copies, he said.

"It is a terrible problem in the liberal arts, in particular, of getting a first book published, and that is often the ticket to being hired by a good university and getting tenure," Pitts said. "Data show that if you can put the material in an open access arena, the mention of the work doubles or quadruples because people out there in the world can find it better."

The OCA is appealing to publishers and other libraries, universities and archives worldwide to offer materials as well. "This is an international effort, not just domestic," said Dave Mandelbrot, Yahoo's vice president of search content. For example, "we would be very eager to integrate French content into the Open Content Alliance and are working with people in France to make that happen."

After Google announced its effort, the French government said it would embark on its own book digitization project, complaining that the Google plan would only accelerate the domination of the English language over other languages.

The OCA effort was applauded by publisher and author groups who have been critical of Google's effort, including the Association of Learned and professional Society Publishers, the Text and Academic Authors Association, or TAAA, and the Authors Guild.

"It is a wonderful idea. It does all the good things that the Google project was represented as doing, but it respects the copyright," said Richard Hull, executive director of the TAAA.

"Sounds fine, but we would want to see the details, of course," said Paul Aiken, executive director of the Authors Guild. "We have absolutely no problem with digitization of public domain works. With copyright works, we want to make sure the people who actually have the rights are the ones granting the licenses. In most cases it would be the authors."

The OCA also is looking for ways to help publishers be compensated for offering copyright protected books to the repository, said Mandelbrot. "We are working directly with publishers to come up with business models to encourage them to come up with ways to make works publicly available," he said.

O'Reilly will make some copyright works available, initially without compensation, to encourage others to participate, Yahoo said.

When asked to comment on the Yahoo project, Google spokesman Nate Tyler said, "We welcome efforts to make information accessible to the world."

5 comments

Join the conversation!
Add your comment
And how will they tell what's in the public domain?
And how will they tell what's in the public domain?

Works in the public domain look excatly the same as works that are not, except for a very few exceptions, where works are marked despite the fact that the law does not require marking a copyrighted work. I Yahoo are to avoid publishing anything that is not KNOWN to be in the public domain, they will have very little to publish!

And then there's the problem of the "shrinking public domain", where every now and then more works are legislated out of the public domain. Yahoo will have to keep track of those and keep removing works from their archives every time the law chamges and takes more works out of the public domain, or every time the status of a work changes from "known to be in the public domain" to some kind of uncertainty state.
Posted by hadaso (468 comments )
Reply Link Flag
what is "public domain"
Great point. An article I was reading on The Technology Suits was asking the same question: <a class="jive-link-external" href="http://www.technologybizdev.com/2005/10/03/yahoo-tiptoes-into-book-scanning/" target="_newWindow">http://www.technologybizdev.com/2005/10/03/yahoo-tiptoes-into-book-scanning/</a> Yahoo may be running into the same problems Google is with copyright holders.
Posted by T-Byrd (9 comments )
Link Flag
hard to believe
why is everyone so afraid of 1's and 0's? If/when I write a book, I would like it to have the widest dissemination possible. Plus a digital copy means: a. protection from physical damage, b. global distribution, c. future-proofed. Why is this such a hard idea to grasp? The future is digital, get used to it. And I wonder why ebooks haven't taken off?!
Posted by agent V (34 comments )
Reply Link Flag
libraries are hot
now if only these companies chose the right domain for the job... like:
www.unilibrary.com
sounds much better than "Google Print", "Open Content Archive", "Ulib", "DLib", etc etc
Posted by spytrdr (13 comments )
Reply Link Flag
Free Curricula
Hopefully this will include old textbooks. They could be a big help to us at the <a href="http://www.freecurricula.org">Free Curricula Center</a>.

-=Steve=-
Posted by (1 comment )
Reply Link Flag
 

Join the conversation

Add your comment

The posting of advertisements, profanity, or personal attacks is prohibited. Click here to review our Terms of Use.

What's Hot

Discussions

Shared

RSS Feeds

Add headlines from CNET News to your homepage or feedreader.