I ran across a recent blog post by storage vendor Cleversafe titled "Three Reasons Why Encryption is Overrated," and as I suspected it generated a lot of discussion in online forums (LinkedIn, Google Groups, log-in required for both) dedicated to those issues.
Beyond the sensationalist headline, the post does raise some interesting points for consideration on the topic of encryption.
- Future processing power--In the future, malicious hackers will be able to crack older encrypted files due to increases in processing speed.
- Key management--An encrypted file has a key to unlock it. Lots of files means lots of keys. Lots of anything equals management headaches.
- Disclosure laws--Such laws mandate that data breaches are reported. Whether or not that exposed data is safely encrypted or not doesn't really matter at that point--the court of public opinion has branded you guilty.
Distribution or dispersal of data (Cleversafe's approach) is certainly one way to deal with emerging security threats, but it may not be the right way for everything. The important thing is to start looking at new technologies and methods to determine what's right for your business and technology strategy.
Follow me on Twitter @daveofdoom.
The basic idea of RAID (Redundant Array of Inexpensive Disks) storage is combining multiple small, cheap disk drives into an array of disk drives (appearing to the computer as a single logical storage unit) that yields performance exceeding that of a SLED (Single Large Expensive Drive).
RAID offers many advantages over the use of single hard disks, including higher data security, fault tolerance, improved availability, and integrated capacity.
That said, RAID was invented more than 30 years ago and simply wasn't designed to work in the terabyte system world that is commonplace today. In fact, RAID is clearly beyond its design limitations for storage in the petabytes.
I discussed via e-mail the limits of RAID with Cleversafe CEO Chris Gladwin, and here's the problem as he sees it: RAID is mathematically reaching a breaking point for data reliability based on one-terabyte drives. RAID 6, based on parity, cannot recover from more than two simultaneous failures, or two non-simultaneous failures plus a bit rate error. It also doesn't automatically protect data, which remains exposed to software, hardware and user error.
Typical SATA drives have a published bit rate error (BRE) of 10^14, meaning once every 100,000,000,000,000 bits, there will be a bit that is unrecoverable. Although this failure rate seems insignificant, when reading 100 terabytes (note: 100 terabytes is 10^14 bits), it is nearly certain there will be an unreadable bit, and if this read happens to be during a rebuild, data will be lost.
There are still applications that can utilize RAID for increased I/O performance. For example, using RAID for a high I/O transactional system would be a good fit. Also, smaller storage applications, for example a terabyte or below, could still use RAID effectively.
Data continues to grow exponentially. Market researcher IDC estimated that the digital universe exceeded more than 281 exabytes in 2007 and will grow 10X by 2011. Enterprises in a number of industries, including media/entertainment, health care, and video surveillance, have already exceeded 100 terabytes of storage in use. Determining the appropriate long-term storage strategies for these industries will be a challenge as they realize the limitations of RAID.
The good news in addressing these data growth issues is the availability of low-cost processors and high-capacity drives. Combined, they provide great opportunities for disruptive innovations that will displace RAID.
One of the arguments in favor of internal clouds is that they can be purpose-built, meaning that you can apply a cloud architecture to a specific problem. One area where this makes a great deal of sense is storage, and specifically distributed storage.
I spoke with Cleversafe CEO Chris Gladwin to get some ideas on considerations in building a storage cloud. I wrote previously about the Chicago Museum of Broadcast Communications' use of Cleversafe's Dispersed Storage to store and deliver terabytes of digitized radio and television content.
1) Ensure that your performance will scale
Your customers won't be happy if system performance degrades as the amount of storage grows--and data storage requirements are always growing. Make sure that your storage cloud design fully distributes every functional element so that system performance doesn't degrade as the amount of storage increases 10x, 1,000x, or even a million times.
2) Build in large-scale reliability up front
If a single server fails once in a while, then any one of a thousand servers will fail inevitably. This means that it is many times easier to design a highly reliable small storage system than to design a highly reliable large storage system. You need to make sure up front that your cloud storage system is as reliable in storing terabytes as it will be in storing petabytes or exabytes--or your system will eventually hit a wall.
3) Design for permanent operation
As Moore's law keeps marching forward, every piece of hardware you buy will be obsolete in three to four years. You need to design a system that maintains full operation, even as pieces of hardware are maintained, repaired, or replaced.
4) Don't try to spend your way out a problem
If your system design doesn't get more cost-efficient as it grows, then you've got a serious problem. This is the reverse-ROI that every CIO fears, and storage can easily become a black hole.
5) Abstract whenever possible
Time-consuming system management tasks of today will become impossible workloads, as your cloud storage system grows. Remove consuming system management tasks with layers of abstraction so that system growth doesn't bury your system administrators.
6) Protect critical information from unauthorized access
Securing and protecting critical information is a key challenge for cloud architectures. Ensure that you use proven authentication methods for information access, and design the system to protect the data from an inadvertent and/or malicious compromise.
You can follow me on Twitter @daveofdoom
Chicago's Museum of Broadcast Communications (MBC) collects, preserves, and presents historic and contemporary radio and television content with the purpose of educating, informing, and entertaining the public through its archives, public programs, screenings, exhibits, publications and online access to its resources.
MBC also runs Museum.tv--which stores and delivers terabytes of digitized radio and television content. Currently, they are featuring a 1984 senatorial debate including Roland Burris--whom you may recognize as the senator just appointed to fill Barack Obama's vacancy (check out the protests at the start of the debate and how the moderator handles it).
Through the years, the primary challenge the not-for-profit MBC faced was the lack of resources to put together a large enough storage infrastructure to handle the massive amount of digital data they had and needed to present. At one point, when their single-server storage setup didn't foot the bill, they had to suspend their online services.
All told, MBC has more than 100,000 hours of content that it needs to store and distribute. Due to its size and lack of structured data, video remains inherently difficult and expensive to store. And let's not get started on reliability and security issues--achieving data reliability and security at the terabyte level using traditional storage methods based on replication requires significant hardware capacity at multiple sites.
When Cleversafe CEO and MBC-member Chris Gladwin heard about this, he contacted MBC to introduce Cleversafe's Dispersed Storage technology that could potentially solve their problems.
Here's how Dispersed Storage works: instead of copying data, Cleversafe divides it into "slices" and disperses it across a secure network to different geographic locations. Each slice contains too little information to be useful, but any threshold of the slices can be used to perfectly re-create the original data. Manageability? Yup. The sum of all the slices is still less than maintaining multiple copies of the original data.
One interesting sidebar to this--in addition to data storage, MBC also relies on Cleversafe for distribution instead of a separate content delivery network (CDN). When users view content on MBC's site, the data is pulled directly from Cleversafe and displayed via a media server in front of the Cleversafe hardware, saving MBC money and physical space without sacrificing performance or scalability for their end users.
- prev
- 1
- next





