ie8 fix

Nutch

IBM BigSheets to preserve fleeting Web data

IBM announced Thursday that it is working with the British Library on a project that will preserve and analyze terabytes of information on the Web before it is lost forever.

Recent research estimates the average life expectancy of a Web site is 44 to 75 days. Every six months, for example, roughly 10 percent of Web pages on the U.K. domain are lost.

In most cases of personal sites, this is no big loss. But in the case of organizations attempting to archive and chronicle elections, news, media, and video, this data leakage presents massive challenges. And even if … Read more