Is RAID storage living on borrowed time?
The basic idea of RAID (Redundant Array of Inexpensive Disks) storage is combining multiple small, cheap disk drives into an array of disk drives (appearing to the computer as a single logical storage unit) that yields performance exceeding that of a SLED (Single Large Expensive Drive).
RAID offers many advantages over the use of single hard disks, including higher data security, fault tolerance, improved availability, and integrated capacity.
That said, RAID was invented more than 30 years ago and simply wasn't designed to work in the terabyte system world that is commonplace today. In fact, RAID is clearly beyond its design limitations for storage in the petabytes.
I discussed via e-mail the limits of RAID with Cleversafe CEO Chris Gladwin, and here's the problem as he sees it: RAID is mathematically reaching a breaking point for data reliability based on one-terabyte drives. RAID 6, based on parity, cannot recover from more than two simultaneous failures, or two non-simultaneous failures plus a bit rate error. It also doesn't automatically protect data, which remains exposed to software, hardware and user error.
Typical SATA drives have a published bit rate error (BRE) of 10^14, meaning once every 100,000,000,000,000 bits, there will be a bit that is unrecoverable. Although this failure rate seems insignificant, when reading 100 terabytes (note: 100 terabytes is 10^14 bits), it is nearly certain there will be an unreadable bit, and if this read happens to be during a rebuild, data will be lost.
There are still applications that can utilize RAID for increased I/O performance. For example, using RAID for a high I/O transactional system would be a good fit. Also, smaller storage applications, for example a terabyte or below, could still use RAID effectively.
Data continues to grow exponentially. Market researcher IDC estimated that the digital universe exceeded more than 281 exabytes in 2007 and will grow 10X by 2011. Enterprises in a number of industries, including media/entertainment, health care, and video surveillance, have already exceeded 100 terabytes of storage in use. Determining the appropriate long-term storage strategies for these industries will be a challenge as they realize the limitations of RAID.
The good news in addressing these data growth issues is the availability of low-cost processors and high-capacity drives. Combined, they provide great opportunities for disruptive innovations that will displace RAID.
Dave Rosenberg dishes up "Software, Interrupted" with nearly 15 years of technology and marketing experience that spans from Bell Labs to multiple start-up IPOs to open-source enterprise software companies. He is co-founder of MuleSource and currently serves as the general manager of Hardy Way. He is a member of the CNET Blog Network and is not an employee of CNET. Disclosure. You can contact Dave via e-mail at softwareinterrupted@gmail.com or follow him on Twitter @daveofdoom. 





OTOH, sure - I can get that the parity algorithms are more than just a little outdated. I also get the concept that even in spite of on-the-fly de-duplication (and other data mass shrinking means), there's going to come a time when the amount of data being stored will exceed the ability of existing means to store it. Then again, it's not like we've not see such a problem before.
I wonder if there's a corollary or a clone of sorts to Moore's Law, but only pertaining to storage?
http://www.cleversafe.com/vision/download-whitepaper
Simply register, and download 2nd whitepaper.
What it looks like from here, is that every file stored is split into equal chunks and each chunk has eg 75% of the parity/checksum/compressed total information required to rebuild eg 75% of the other chunks attached to it so as long as three out of four chunks are available then the file is still recoverable.
These chunks are then distributed across a SAN which can be LAN or WAN based, the latter of which would be very useful in terms of offsite data recovery if you can afford the bandwidth.
A simple but very effective idea (of course, implementation's a different matter entirely).
While I fully expect them to put the best gloss on their own product with their claims of storage/bandwidth utilisation it's still not possible for storage/bandwidth to go over 200% no matter the number of sites.
A check-summing file-system designed for large data stores under development at the moment. Copy on write too...
Again though, this is an end user usage of RAID, not a massive storage array usage, but RAID will continue to have it's place in the end user world.
- by mscritsm July 27, 2009 11:23 PM PDT
- If hard bit errors are approaching 1 in 100 TB, then disk drives themselves are destined to die as a technology. In a few years, a single disk drive will approach 100 TB of capacity, meaning the drive manufacturers will no longer be able to sell drives that are now guaranteed to come with at least one hard error on them at all times.
- Like this Reply to this comment
-
(26 Comments)No, drive manufacturers will have to reduce hard bit errors even as they increase capacity or face extinction (in which case RAID becomes moot anyway). But if drive manufacturers do decrease the bit error rate in order to survive, the arguments against RAID based on bit errors will become invalid.