June 24, 2004 12:36 PM PDT

Saving Shakespeare's blog

The British Library, famed for a collection that includes a First Folio of Shakespeare, two Gutenberg Bibles and the scribbled lyrics to "I Want to Hold Your Hand," is starting the initial phase of a project that may eventually lead to it archiving all U.K. Web sites.

A trial project to archive 6,000 British sites was announced this week by the U.K. Web Archiving Consortium. The group, led by the British Library, includes the Wellcome Trust, the National Archives and the Scottish and Welsh national libraries.

Each member of the consortium will choose content relevant to its subject. All types of Web content will be included, from government documents to blogs.

Richard Boulderstone, director of e-strategy at the British Library, said all types of material will be collected, including "informal material" such as discussion forums. "Letters and other informal works tell us how society is actually operating," he said.

The British Library will not censor the material because it does not want to restrict what people can find out about in the future.

"We would like to take a snapshot of every year, as a sample of what the Web looked like", said Boulderstone, suggesting that in the future, people could look back to 2004 and see the swear words that Web surfers were using.

At the moment, according to a consortium press release, online material exists for only about 44 days, not far from the life expectancy of a housefly.

A limited number of Web sites will be archived initially, but "ultimately, we would like to archive the whole U.K. Web," Boulderstone said.

One of the problems the consortium faces is that, due to U.K. copyright law, permission is needed before a site can be archived. The British Library is working with the government to extend the law to allow them blanket access to all Web sites, because "there are 4 million sites that we would like to capture--we cannot ask everyone for permission," Boulderstone said.

The Web Archiving Consortium is not the first to archive the Web. The Wayback Machine, run by the United States-based Internet Archive, enables people to visit archived versions of Web sites.

According to Boulderstone, the British Library's approach differs from that of the Internet Archive in that it seeks permission from Web sites. In the future, the British Library hopes to improve on Wayback by archiving more frequently and with more depth, and by providing metadata so that information can be found more easily.

Ingrid Marson of ZDUK reported from London.

 

Join the conversation

Add your comment

The posting of advertisements, profanity, or personal attacks is prohibited. Click here to review our Terms of Use.

What's Hot

Discussions

Shared

RSS Feeds

Add headlines from CNET News to your homepage or feedreader.