How long is long-term storage?
There is a big disconnect between how long people think they should be storing data and how long they actual can. One group of vendors and academics is trying to change that.
Two years ago, the Storage Networking Industry Association's Data Management Forum reported the results of a landmark study that looked at the state of long-term storage, i.e. preserving a digital object for more than 10 years. Some disturbing results jumped out.
The study suggested that we live in a digital version of the Dark Ages. I'm talking about it now because I think the messages from the study are still very relevant to both IT administrators and consumers.
A whopping 80 percent of the 276 organizations included in the study reported a need to retain electronic records for more than 50 years, so let's start there. How many of you storage administrators out there actually think you can do 50 years of electronic records retention given current technology? Without data loss? OK, so you won't be doing the same job 50 years from now, so why care? Next question: How many of you think that you can do more than three migrations of archival data from one storage media to the next without data loss? According to the study, the answer was very few of you.
Here's one for consumers: How many of you using Internet photo services sites think that your digitized images will still be there 50 years from now? You haven't thought about that, right? You and your spouse take pictures of the newborn today, you store them online, and maybe you store them at home, too. Here's a suggestion: make sure to print them and preserve the prints for as long as you can because if the enterprise-level storage administrators who have been doing digital storage for decades have little confidence in their ability to do long-term digital preservation, you shouldn't have much confidence either.
So there's a big gap here. A group of concerned vendors and academic advisers have formed the 100 Year Archive Task Force under the auspices of the Storage Networking Industry Association's Data Management Forum wants to start filling the gap. You can follow their progress or become involved yourself here.
One more result from the study still has me puzzled. Slightly more than half of the 276 organizations surveyed reported the need for "permanent" storage. What might fall into the permanent category? I thought of the Founding Fathers writing the U.S. Constitution and wondered what that process would have been like if they were all using a collaborative work-flow tool like Microsoft SharePoint. For sure, they'd print out the final version for all to see--on parchment maybe? But what about all the draft versions and messaging back and forth--in short, all the supporting documentation that clue us in on their state of mind and tell us what they really intended? Would they have printed out all of that, too? I dare say that insight would be gone forever.
We rarely, if ever, think of saving our digitized thoughts for the sake of posterity. But for the sake of historians, lawmakers, sociologists, and scientists yet to be born, we should--or people centuries from now might look back on this as the digital version of the Dark Age centuries from now.
John, a senior partner at Evaluator Group, has 30 years of experience in enterprise IT storage, spanning mainframe and open systems environments. He has served as principal IT adviser at Illuminata and has held analyst positions at IDC and Yankee Group Research. He also co-authored the book "Inescapable Data Harnessing the Power of Convergence." John is a member of the CNET Blog Network and is not an employee of CNET. 





What about handling the constant changes in digital data formats? Even with perfect data preservation, could any modern programs read, say, a database stored on disk 40 years ago?
I'd argue that is a much larger problem than the finite lifespan of digital storage media.
Also there was something about keeping around the hardware to.
I can attest to this being an important issue. Remember when CDs became all the rage? Store your "long term" data there... I've got quite a few CDs from that era, and many are unreadable, especially the "blue-tint" ones.
Nice. And my 5 1/4" disks from 1982 are still readable today.
Hmm.
And if I lose one of those, I lost one disk of data, typically 360k. Lose a hard drive, you lose everything on it. 50gb for my current computer. Lose a CD, you lose what, 700mb? All at once.
may lose a bit, but better than nothing and at the older the data the more chance of recovering it. lol at least im planning on going bluray and take this method, less dvd =.= 10gb of picture only and add on a couple other things, dvd started to look too small for the task
We need some media that's permanent once written.
Same situation would happen with paper documents in the case of floods.
Paper documents appear in fragments and require much work to salvage what can be saved.
Stone tablets were better.
Another issue is *what* to save. As the article alludes to, we have copies of the US Constitution, but not all the letters or dialogue exchanged that went into it. Is that important? I claim not as important as the end document.
Today, there seems to be a tendency to save *everything*. I'm being overloaded with information, from paper privacy notices to electronic info.
There was actually a company who modded a writer to actually physically etch a pattern on to a round disc instead of just changing some chemicals.
Things such as this would be a god send, but the actual burning part might need to go through some safety tests first, and the whole process of dealing with any residue and by-products created from the process.
And for the sake of being future proof (even after humans might be long gone), the discs could be printed initially larger, then getting progressively smaller to the point of CDs pit-sizes.
There could also be some sort of pictogram that shows something like a magnifying glass that shows some letters larger.
Any race smart enough should understand there is hidden data on the drive.
The only problem is this data would be stored in something like binary, so unless some robots find it, i doubt they'd understand it.
And even then, they would have to have knowledge of ASCII/Unicode and hexadecimal to even know what it means. (and image formats, video, audio, etc)
Screw the future...
Quick, what does www.beenz.com look like? Nine years (or so) ago, it was a huge to-do as a site for alternate forms of money. Whoopi Goldberg was on TV pimping the site for all she was worth.
Now? It's a parked domain with no original content. The best you can hope for is the Wayback Machine, which only holds partial content (whatever it was allowed to crawl). Here's that link from 2001:
http://web.archive.org/web/20001019084446/www.beenz.com/splash.html
Poke around in there awhile - you'll find half the images missing, none of the data working, and even the flash games are broke.
Here's one even better - pets.com was a site that got bought by PetSmart, and contains none of its original documentation or content. Maybe Petsmart has it, maybe they don't... but if you had anything on it, you probably don't now.
:)
Of course, electronic archiving saves space and resources, but with the continuing changing technology (CD, DVD, DVDR, BlueRay, etc) we can not be in the sharp edge of technology every time that a change occurs.. What shall we do?
How about medical data such as imaging, most diagnostic imaging has gone completely digital, instead of film images are now stored on PACS servers, this storage is considered indefinite at least in the sense of the records not being purged
I wrote about so-called "forward-compatibility" back in 2004:
http://www.datamobilitygroup.com/saltworks/archives/27#more-27
And it's still a problem today as you can see from John Webster's article.
I maintain that paper (esp. archival quality) is still one of the best forms of long-term storage for your most important personal documentation....treasured photos, birth certificates, marriage licenses, etc. Yes, it comes with its own issues, but it will not require nearly as much babysitting as it would in digital form.
http://www.millenniata.com/index.html
I have always felt that the National Archive should be expanded and opened to the public and it should be their job using tax payer money to backup and archive the data of the citizens of the united states. In a hundred years or even more or less that information will be important. Imagine if today was like Ancient Egypt 5,000 years ago what it would be like to have all of the information today. Imagine 1,000 years from now if our people could have access to all of today's digital files what they would know and learn. This information must not be lost. If is then we might as well do what the ancient Egyptians did and chisel it in to stone because that I fear will last longer than our digital files and there is something wrong when chiseled stone lasts longer than digital data. Stone degrades, digital doesn't.
Given a standard format this could even be read electonically by optical devices with OCR capabilities.
If you really wanted it to last longer then encapsulate it with a quartz window.
- by fazalmajid September 22, 2009 5:22 PM PDT
- The limiting factor for the longevity of data is not technical, it is curatorial. Data can survive storage technology and even format changes (with some loss of fidelity, e.g. downconverting word processing formats to plain text) but it has a hard time recovering from the turnover of people who maintained it.
- Like this Reply to this comment
-
(27 Comments)In most cases, this is a good thing. There are 150,000 books published each year in the US alone. I doubt more than a handful a year are truly worth preserving for posterity. The sooner the dross gets purged, the less it will clutter future efforts to unearth useful literature.