• On CHOW: Sexy vampire party
June 12, 2009 10:57 AM PDT

Deduping: Killer app behind battle for Data Domain

by John Webster
  • Font size
  • Print
  • 4 comments

Much drama has ensued since NetApp announced the intended acquisition of Data Domain on May 20 for the whopping sum of $1.5 billion.

EMC countered with a $30-per-share offer valued at $1.8 billion. NetApp then raised its offer to $30 a share, valued at $1.9 billion. Data Domain essentially said, "Thank you, EMC, but we like the new NetApp offer more than yours." EMC then claimed that it had been unfairly shut out of the bidding process and appealed directly to Data Domain employees.

NetApp countered with a claim that EMC's potential acquisition of Data Domain would fail a federal regulatory review, a claim that EMC has rebutted as it considers shoveling more cash into the fire to make its proposal more attractive.

To its suitors, Data Domain is now reportedly worth $1.9 billion. To give you some perspective on that figure, Oracle recently agreed to acquire Sun Microsystems for $7.4 billion. A $1.9 billion acquisition would mean that Data Domain is now worth about 24 percent of that number, yet its 2008 revenues of $274 million are a tiny fraction of the $13 billion Sun took in sales revenue during 2008. Here's another relevant data point: EMC acquired VMware for a mere $635 million.

Deduplication is the storage world's new killer app. It's the great shrinking machine. Think of the old Steve Martin "let's get small" routine. It shrinks big data down to a small fraction of its original size--way more than is possible with the more common data compression routines. Why is that process now worth billions of dollars?

Most IT shops are moving away from using tape as their primary backup media in favor of disks. Deduping makes this migration economically viable by greatly reducing the backup data footprint on disk arrays by factor of 20 to 1, on average. You can't do that with tape. Nor can you get the input/output performance of disks from tape.

But that's not all that deduping does. It can be run against primary data storage streams to reduce the data footprint within expensive primary storage arrays. NetApp, among other vendors, supports this. Running it here may amount to the functional equivalent of buying another array, given the capacity that's saved as a result. When IT budgets are constrained, and storage is one of your top budget priorities, that's a big deal.

One can also dedupe archival storage, making the disk a repository for archival data that may need fast accessibility on a periodic basis--like when your corporate attorney needs to find exculpatory e-mails from three years ago and needs them yesterday.

So now everyone has to dedupe. Every major storage vendor, from EMC to Hewlett-Packard to IBM, now offers at least one dedupe option of the many that are now available, including the in-line and post-process variants. IBM, for example, offers four options.

In spite all its high-profile competition, Data Domain has been the acknowledged leader in integrating deduplication into the backup process. It offers disk-based deduplicated storage arrays for heterogeneous backup environments, and it leads all contenders in this space, in terms of market share, by a wide margin.

Does a leading position in a killer app justify a $1.9 billion valuation for a relatively unknown company mining a niche storage opportunity? Stay tuned. The executives at EMC and NetApp hate to lose, and EMC may yet win the heart of the fair maid named Data Domain.

John, a senior partner at Evaluator Group, has 30 years of experience in enterprise IT storage, spanning mainframe and open systems environments. He has served as principal IT adviser at Illuminata and has held analyst positions at IDC and Yankee Group Research. He also co-authored the book "Inescapable Data Harnessing the Power of Convergence." John is a member of the CNET Blog Network and is not an employee of CNET.
Recent posts from Data-driven
What integrated compute stacks mean for storage professionals
Will EMC's rising tide float all storage boats?
What the T-Mobile outage means for consumers
MaxiScale and the emergence of software-defined storage
The remodeling of EMC's executive office suite
VMworld 2009: Great for storage vendors
Georgens takes command at NetApp
How long is long-term storage?
Add a Comment (Log in or register) (4 Comments)
  • prev
  • 1
  • next
by bj1126 June 12, 2009 1:14 PM PDT
I am watching this one closely. It will be interesting to see what comes of this.
Reply to this comment
by JohnSWebster June 19, 2009 9:08 AM PDT
It was once the case that dedupe was only for backup. Then NetApp allowed users to dedupe primary data stores under ONTAP. Other vendors are following suit. There will be a number of variations on this theme for different applications I believe. I first encountered dedpue when I met a company named Rocksoft back in 2003. Their patents for a process they called Data Coalescence were filed in 1992. They were bought by ADIC, which in turn was bought by Quantum. If you're watching this space closely, that should ring some bells. JW
by Seaspray0 June 14, 2009 7:21 AM PDT
You need to look at the way they shrink data. Basically, they do what's known as a "single instance store" of data. If you have 10 GB of data, the first full backup takes up to 10 GB (compression usually gets you about 30% savings). The next time you do a full backup, the app sees that you're backing up the same data (sectors are identical) and only puts in pointers to the first data. So... your second full back of the same data takes a very small fraction of space since you only retain one single instance of data and point to it over and over again. If your data is fairly static, you can do hundreds of backups and consume roughly 11 to 12 GB of space (one instance of data and lots of pointers). On tape, you would be consuming roughly 1000 GB of space. "Cool!" you would say, and yes it is. But there is a danger as well. Each tape holds its own instance of the data. For 100 tapes, that would be 100 instances. Redundant? Yes it is. But I'm also concerned of only having 1 instance being referenced 100 times. All is takes is one failure in one instance, and the data is gone for all 100 full backups. For that reason, Data Domain doesn't use raid5. They use the next revision of the raid where you have 2 parity drives per logical disk instead of one (allowing 2 disks to fail and still retain the data). I would prefer to include a mirror of that as well. I want more than one instance. Data Domain gives you the option of duplicating to another appliance and this satisfies my desire, but the cost is double for that solution. Adding the second appliance makes tape very competitive in price. I will still keep an eye on the cost. If Data Domain lowers their prices where doubling up the appliances is cheaper than tape, I'd very much consider it.
Reply to this comment
by JohnSWebster June 19, 2009 9:17 AM PDT
While I?m not going to argue the case for DataDomain or their pricing practices, I will say that with higher density disk on the near-term horizon, the cost per GB will likely come down to the point where your analysis tells you it?s time to jump in. Personally, I think we?re at that point now, which is why there is the huge interest in DataDomain. JW
(4 Comments)
  • prev
  • 1
  • next
advertisement

Most Popular

A CNET Conversation with Eric Schmidt

CNET's Tom Krazit and Molly Wood sit down with Google CEO Eric Schmidt to discuss the future of Android, the Chrome OS, the problem of real-time search indexing, and more.

Verizon tests sending RIAA copyright notices

The No. 2 phone company, known for its reluctance to intervene in antipiracy cases, strikes an agreement to forward copyright notices on behalf of the music industry.

About Data-driven

Storage is more--way more--than a mere peripheral. In Data-driven, John Webster probes into storage technologies, the vendors behind them, and how customers use them in the context of market drivers such as Web 2.0, cloud computing, and the need to get meaningful information from the data fire hose that is now part of our daily life.

John is a senior partner at Evaluator Group. He has served as principal IT adviser at Illuminata and has held analyst positions at IDC and Yankee Group Research. He also co-authored the book "Inescapable Data Harnessing the Power of Convergence." John is a member of the CNET Blog Network and is not an employee of CNET.

Add this feed to your online news reader

Data-driven topics

advertisement

Inside CNET News

Scroll Left Scroll Right