March 16, 2009 3:02 PM PDT

What happens when clouds (inevitably) burst?

by Dave Rosenberg
  • Font size
  • Print
  • 3 comments

Microsoft became a true cloud provider this past weekend as it experienced nearly 22 hours of downtime on its fledgling Azure Services Platform. The cause of the outage has not yet been disclosed to the general public or the Azure user community.

In contrast to on-premise systems, in which the user is responsible for dealing with infrastructure problems, a big part of the appeal of the cloud is the fact that you don't have to manage your own systems, or deal with the inevitable failures that occur.

It's easy to go off on a tangent about the necessity of monitoring the cloud, but the real issue is one of communication. If Microsoft wants to be taken seriously as a hosting provider--especially one defining a very nascent wave of technology--there needs to be more information beyond what a single admin updates on an MSDN forum.

Of course, we would also assume the same thing of other cloud providers like Amazon Web Services, Google App Engine, and Salesforce.com, all of whom only provide the most basic uptime details (green=good, red=bad) with little to no explanation as to what exactly is being monitored. The obvious argument is that users don't need to know...until something goes wrong and information is scarce.

Third-party services such as Hyperic's Cloudstatus.com provide additional insight, but cloud vendors themselves have to become much more ardent about system status and the implications. How can vendors help to assuage issues related to outages?

• Visibility: Give customers immediate (real-time) visibility into the availability and performance of the services that you are delivering to them.

• Transparency: The performance and availability data needs to be freely available. Don't hide these metrics behind a login or some complex credentials-only mechanism. Companies who use this rule will succeed, and they will set the standard and force the rest of the industry to follow.

• Trust: Above all else, report accurately. The most important asset a cloud services provider has is its reputation. Customers will forgive a service disruption--we all know computer systems have their periodic hiccups. Customers will not forgive anything that is less than honest and forthcoming.

This leads to one of the larger questions about cloud adoption: what happens when things go wrong? And are you prepared when things go bump in the night?

  • As a user, what is your backup plan if your cloud provider fails?
  • As a provider, what are you doing to communicate effectively with your users?
  • As a provider, do I have the run-book in place for a large-scale outage?

In the case of Azure, there aren't yet many commercial applications currently running. Still, it's Microsoft's responsibility to be on top of the status of their services and be constantly communicative when things go wrong.

Availability is paramount to any other perceived risk of using the cloud. Issues like security and latency have always been concerns, but nothing else matters if the cloud platform or application isn't available.

One interesting technical aside: Azure appears to have a required five-hour, full reboot of the system, which is probably fine now as the user base is fairly small. But just think about how long it would take to reboot all of Amazon Web Services. (An AWS total reboot is unlikely to happen as Amazon's service is built in zones. But hey, you never know.) Or how about the impact of 17 hours of intermittent availability plus 5 hours of reboot time in the context of AWS? Literally hundreds (thousands?) of businesses would wind up offline in some manner.

As Gavin Clarke wrote on The Register, "Microsoft wanted to offer people the full cloud experience. Well, now it has."

Follow me on Twitter @daveofdoom

Dave Rosenberg dishes up "Software, Interrupted" with nearly 15 years of technology and marketing experience that spans from Bell Labs to multiple start-up IPOs to open-source enterprise software companies. He is co-founder of MuleSource and currently serves as the general manager of Hardy Way. He is a member of the CNET Blog Network and is not an employee of CNET. Disclosure. You can contact Dave via e-mail at softwareinterrupted@gmail.com or follow him on Twitter @daveofdoom.
Recent posts from Software, Interrupted
Video games outsell movies in U.K.
Android and iPhone users not so different after all
Flexing the boundaries of flash memory
LG, RIM top Apple in number of phone users
A modern approach to Java application development
Mountain Dew drinks up social media (Q&A)
Top ad trends list spotlights online behavior
IBM closes lackluster M&A year with buying spree
Add a Comment (Log in or register) (3 Comments)
  • prev
  • 1
  • next
by nscnet March 16, 2009 3:47 PM PDT
This is a brand new system, most people don't even know Windows Azure exists... either way, as I said, its brand new, don't expect perfection, yet.
Reply to this comment
by calmor15014 March 16, 2009 9:12 PM PDT
Brand new or not, whenever it goes down, people are losing money, and Microsoft (and Amazon, Google, etc) owe it to their customers to provide at least some insight into what's happening.

It's good that this kind of issue occurred now - gives MS time to work out the bugs while the system is relatively small, but how long will it be out when the next bug or even external failure occurs 3 years from now?
by Len Bullard March 17, 2009 12:18 PM PDT
I'd be a lot more concerned about the rising costs of coarsened services as the deal can change at each contracting period and will. What you give up to the cloud is leverage and no one does that except the government.
Reply to this comment
(3 Comments)
  • prev
  • 1
  • next
advertisement

15 sites that went kaput in 2009

Web sites launch all the time, but they also shut their doors. We highlight 15 that bit the dust this year.

Top 10 news stories of the decade

Let the debate begin: Was the iPhone more important than iTunes? Was anything bigger than Google finding a great business model? CNET offers its list of the 10 most important stories of the '00s.

About Software, Interrupted

In "Software, Interrupted," Dave Rosenberg discusses disruption in the software market, as well as the products and services that keep business technology norms in perpetual flux.

With nearly 15 years of technology and marketing experience spanning from Bell Labs to multiple start-up IPOs, Dave co-founded open-source software company MuleSource and now serves as general manager of Hardy Way. He also happens to be a U.S. patent holder and a workaholic. Technology is his best friend and mortal enemy.

Add this feed to your online news reader

Software, Interrupted topics

advertisement
advertisement

Inside CNET News

Scroll Left Scroll Right