July 29, 2005 4:00 AM PDT

Big storage on the cheap

A correction was made to this story. Read below for details.
Enthusiasts learned to build their own PCs decades ago. Now you can assemble a storage system in your living room that could make the Pentagon jealous.

San Francisco-based Capricorn Technologies has crafted blueprints, available from the Internet Archive on an open source basis, which effectively lets people build multi-terabyte and multi-petabyte storage systems fairly inexpensively. The company also builds its own line of storage systems, called the PetaBox, and has landed deals with several universities and research departments with its low-budget approach.

News.context

What's new:
Capricorn Technologies has developed blueprints that allow users to build multi-terabyte and multi-petabyte storage systems. The company also builds high-capacity storage systems for the relatively low price of $2 a gigabyte.

Bottom line:
Universities and research departments have purchased Capricorn's storage systems, although competitors and industry observers say that those clients tend not to require the higher-performance (and much more expensive) storage systems needed by mainstream businesses. Still, Capricorn says it plans to become a major player in the storage market.

More stories on this topic

How cheap are they? Capricorn's storage systems cost about $2 a gigabyte, said the company's chief executive, C.R. Saikley. At that price, the cost breakdown would be about 65 cents for the gigabyte of storage and $1.35 for racks, software, networking, management tools and other components.

That means that a Capricorn 1-terabyte system (which consists of 1,000 gigabytes) would sell for about $2,000, while a 1-petabyte system (1,000 terabytes) would cost about $2 million.

By contrast, a petabyte-class storage system from EMC might cost $20 a gigabyte, while similar systems from smaller companies might cost $10 a gigabyte, said Arun Taneja, an analyst with Taneja Group. A petabyte-class storage system will run into the millions, said an EMC spokesman.

"We're a fraction of the price of those guys," Saikley said. "Our goal is to become the low-cost leader in storage."

The growth of the Internet and services such as Google's Gmail and Apple Computer's iTunes have caused a corresponding explosion in the amount of data that needs to be archived. A petabyte is a vast amount of storage space. It represents around 450,000 hours worth of TV programming, or all the e-mail produced in the world on a single day, according to storage makers.

Mushrooming amounts of data have in turn fueled demand for large storage systems. Luckily, the drive industry has continued to improve its technology, doubling the density of hard drives every two years or so while dropping the price. While drive makers regularly lose money, consumers and others benefit.

Supersize me

The higher price of commercial storage systems comes with significant performance advantages, said an EMC spokesman. The systems that EMC specializes in are geared toward handling thousands of transactions simultaneously for hours on end without failure. A lot of university labs don't need that sort of horsepower.

"The challenge is providing the performance that scales with capacity," the spokesman said.

A Taneja analyst added that the low price raises red flags about Capricorn's commercial viability and performance of the systems, particularly for mainstream business users. Still, "two dollars is a miserably low price for disk-based storage," the analyst said. "The price they are talking about is about the price of the hardware."

The company emerged out of a collaboration between Brewster Kahle, founder of the Internet Archive, and Saikley. The archive, which strives to preserve books, Internet pages, music, TV shows and other digitized information, needed to expand its storage capacity but was constrained by its budget. The archive also wanted to keep power consumption down.

"We were unable to find what we knew was possible," said Saikley, who added, "I've been a personal friend of Brewster's since the Carter administration."

In 2004, Saikley devised a 100-terabyte storage system that consumed approximately 60 watts per terabyte.

Subsequently, he formed Capricorn and continued to tweak the technology. The company's flagship product is now the PetaBox TB64, a 64-terabyte storage system that consists of several 1U (1.75 inches high) modules slotted into a rack measuring approximately 2 by 2 by 6 feet. It consumes 50 watts per terabyte. The modules come with 400GB drives from Hitachi and processors from Via Technologies. Versions using Intel chips are also available.

In June, Capricorn shipped a petabyte worth of PetaBoxes to the Internet Archive. The petabyte system occupies about 16 racks and contains a few thousand hard drives.

The Internet Archive submits all of its intellectual property to the open-source community. Since the storage system was designed on a commission from the organization, the organization owns the designs to the system and hence opened them to the public. Still, because customers don't necessarily want to assemble storage systems themselves, Capricorn is landing contracts. The company is also looking at ways to enhance its portfolio.

"I see us expanding our market presence and adding features and services," Saikley said.

 

Correction: This article incorrectly reported the availability of blueprints for a storage system. Capricorn created the blueprints, but the Internet Archive released them under an open-source license.

8 comments

Join the conversation!
Add your comment
Am I Missing Something Here??
Two dollars a gigabyte is cheap? That means a 50 gigabyte drive would cost $100. What am I missing here. Two thousand dollars for one terabyte? A 400GB Hitachi drive from Tigerdirect.com is only $310. Three of those would cost only $930 and provide 1.2 Terabytes of total storage space. Do the math. How is two dollars a megabyte anykind of a good deal?

I'd bet there is software that manages data across multiple smaller drives. If I am missing some key economic factor, or there is some important technical factor that I am ignorant of, I would appreciate an explanation. Regards to everyone.
Posted by Terry Gay (127 comments )
Reply Link Flag
re: Am I Missing Something
A RAID array requires one drive to be used for parity. Total capacity is always n - 1 drives. So you will need 4 drives to provide a terrabyte, which is what Capricorn describes for their GB 1000. The array also includes a 1 gigahertz CPU, motherboard, memory, chipset, power supply (which is probably also redundant, backplane, cooling fans, etc. Then there's their own product markup to make a few bucks.

<a class="jive-link-external" href="http://www.capricorn-tech.com/gb1000.html" target="_newWindow">http://www.capricorn-tech.com/gb1000.html</a>
Posted by Stating (869 comments )
Link Flag
You Forgot
Yes you build a one for cheaper, but compared to similar systems for EMC,Iomega, and others it is cheap. There is more to the cost than just the drives, you have the rack mountable case, NIC, power supply, controller, and Mobo + processor. There are consumer models from Kanguru that come in small form factor that are about a grand for 1 terabyte. These are pretty cool systems but not rack mountable.

<a class="jive-link-external" href="http://jmaximus.blogspot.com" target="_newWindow">http://jmaximus.blogspot.com</a>
Posted by jmaximus9 (86 comments )
Link Flag
Media is only part of the system
Raw drive prices are great but that's only part of high-availability storage. Redundancy and the hardware &#38; software to keep things running cost money. The last few years it's become the larger portion of large systems.

Scale it yourself. A single old PC can handle 4-8 drives with minimal creativity. Software RAID gets you redundancy. Hardware RAID would be better for a little more. What about 15-20 drives? Maybe hot-swapping? Warm spares? What about monitoring &#38; notification? Try to provision 20-30 TeraBytes as one volume with zero down time, and without staffing people to chase their tails with sub-par equipment. Eventually, power consumption plays a part. The decisions &#38; cost stack up.

No doubt about it, drive costs are much better. Most companies can get by with these lower cost sytems, but the decisions and answers haven't changed all that much. A high availability array will always cost more/Byte than its individual storage media.
Posted by (22 comments )
Reply Link Flag
Peta-Box is where you hide the camera...
...so they can't see it???
Posted by juchestyle (21 comments )
Reply Link Flag
Partially agree with EMC, etc.
I partially agree with EMC and others that these systems are not for the intensive data users. After looking through Capricorn's specs pushing lots of data onto and retrieving lots of data from these arrays will not be their main use. Unless I missed something they don't support any cutting edge transport technologies such as 10 or 40 Gbps Ethernet, 4 Gbps FibreChannel, or a FibreChannel fabric.

These systems seem to be the next step for hard drive manufacturers to get people off tape. These systems are best used for stuff that is stored and occasionally used, but not for data intensive applications. These systems may replace robotic tape cabinets.

EMC's (and other's) systems are overpriced for very large systems (petabytes) where the hardware/software implementation is done in the 10s - 100s of TB range and then just duplicated for whatever size you want. Unless you have a system that requires extremely high availability and throughput capabilities $20 per GB is insane. For that kind of money in the petabyte range you can afford a completely custom implementation.

With the surge in storage requirements due to SOX and such (and new systems coming down the road which will utilize massive amounts of data) EMC and others will have to drop their prices or people will seriously consider options like this.
Posted by shadowself (202 comments )
Reply Link Flag
the network is the computer
yah whatever.. the storage industry is crazy..
let them go on about their business.. iscsi is
going to rule and centralized storage in some
third world country ram factory is going to be
less.. check out the storeage metatags.. hmmm..
and the Enrons of the world are going to raise
the tide and we will all flee to mexico.. got
sand? got a san? wish I had a sandwich.. I think
salton should to something with tortillas... yah!
Posted by (187 comments )
Reply Link Flag
$2 a gig
I thought at first that it was a bogus price too, but its the package that you're paying for. Like the article states, it breaks down to $.65 a gig and $1.35 for everything else. The everything else is hardware, racks, power sources, the fact that storage is on separate drives, and management software (probably an OS). True, you could rig up some stuff on IDE's or SATA's if you wanted to, but this is from a company, all together, all done for you, along with technical support and someone t ***** at when **** blows up.
Posted by amithatguy (3 comments )
Reply Link Flag
 

Join the conversation

Add your comment

The posting of advertisements, profanity, or personal attacks is prohibited. Click here to review our Terms of Use.

What's Hot

Discussions

Shared

RSS Feeds

Add headlines from CNET News to your homepage or feedreader.