• On TechRepublic: Windows 7: Slower to boot than Vista?
July 21, 2008 4:55 PM PDT

Amazon S3: For now at least, sometimes you have to reboot the cloud

by Stephen Shankland

Amazon.com's Simple Storage Service, S3, spent a few hours Sunday in a big pothole on the road to the glorious cloud computing future, with an outage taking the storage system offline for several hours Sunday. Should we be surprised?

No. In short, the computing industry is making up what's called cloud computing as it goes along, often with a server and networking architecture that's one part improvisation to two parts proven best practice. Frankly, it's notable to me that some services are as reliable as they are.

Some Amazon Web Services were down for hours on July 20.

Some Amazon Web Services were down for hours on July 20.

(Credit: Amazon)

Computing practices tend to gravitate toward one of two poles. One is tight control, higher prices, and high reliability. The other is openness, lower cost, but some degree of flakiness. High-end mainframes and Unix servers can handle transaction loads that would crush most machines using Intel or AMD x86 processors, but they cost more and are less adaptable. Most of the cutting-edge, large-scale action in the Internet--including various cloud computing efforts--is happening with the more free-wheeling technology.

One company operating at colossal scale, Google, has concluded it's better to buy cheap x86 servers and write software that automatically paves over hardware failures. The bigger problem comes when a large system composed of many interacting components loses track of its self-conception, and rebooting a single system or swapping out a hard drive isn't sufficient.

Essentially, Amazon had to reboot S3. Here's how the company described its S3 problem in a statement:

"As a distributed system, the different components of S3 need to be aware of the state of each other. For example, this awareness makes it possible for the system to decide which redundant physical storage server to route a request to. We experienced a problem with those internal system communications, leaving the components unable to interact properly, and customers unable to successfully process requests. After exploring several alternatives, the team determined it had to take the service offline to restore proper communication and then bring service online again. These are sophisticated systems and it generally takes a while to get to root cause in such a situation," Amazon said. "We will be providing our customers with more information when we've fully investigated the incident."

Afterward, Om Malik called cloud computing frail: "The S3 outage points to a bigger (and a larger) issue: the cloud has many points of failure--routers crashing, cable getting accidentally cut, load balancers getting misconfigured, or simply bad code. And he's right, to a degree, but there are three things that shouldn't be overlooked before writing cloud computing off as a failure.

• First, you should compare the problems of cloud computing to the alternatives, including running computing services in-house. Last I checked, corporate data centers also have crashing routers, bad code, and misconfigured load balancers.

• Second, you can expect reliability to increase as the companies providing cloud infrastructure and services figure out explore the terra igcognita.

• Third, don't confuse Web 2.0 with the foundational elements of cloud computing. A Web site that uses an online application at another site to mash up data from some other sites then present it using a service from yet another site is indeed susceptible to numerous points of failure. But a single-purpose infrastructure such as Amazon S3 is at least in theory a more tightly controlled, single-purpose utility that can offer higher reliability.

That's not to excuse Amazon's outage or gloss over the effect it had on business partners reliant on it. After all, S3 is the sole part of Amazon Web Services that comes with a service level agreement to promise customers reliability.

But a little silver lining to this particular cloud problem is that Amazon is setting expectations at the right level: They said in a statement, "Any downtime is unacceptable, and we won't be satisfied until it is perfect."

Stephen Shankland writes about a wide range of technology and products, but has a particular focus on browsers and digital photography. He joined CNET News in 1998 and since then also has covered Google, Yahoo, servers, supercomputing, Linux and open-source software, and science. E-mail Stephen, or follow him on Twitter at http://www.twitter.com/stshank.
Recent posts from Business Tech
First iPhone, now Droid. Who needs Windows?
Week in review: Microsoft getting lucky with 7?
Microsoft's weak cloud privacy position
One charge hard to level at Intel: Raising prices
Nvidia CEO unsurprised by Intel lawsuit
Near-final Thunderbird 3 due next week
Google offers JavaScript programming tools
Windows 7 sales outshine Vista
Add a Comment (Log in or register)
by zanely July 21, 2008 5:52 PM PDT
Why exactly are high-end mainframes and Unix servers "less adaptable"? To what, catastrophic system failure? Having a loopy system of bargain basement servers "lose track of its self-conception" is a textbook example of buy cheap, get cheap.
Reply to this comment
by zanely July 21, 2008 5:56 PM PDT
Why exactly are high-end mainframes and Unix servers "less adaptable"? To what, catastrophic system failure? Having a loopy array of bargain basement servers running a critical business application that "loses track of its self-conception" is a textbook example of buy cheap, get cheap.
Reply to this comment
advertisement

FAQ: Buying the right Windows 7 upgrade

Readers still have lots of questions on just which version of the software they need to buy in order to upgrade their PC. CNET News tries to offer some answers.

N.Y. lawsuit details Intel's 'largesse' toward Dell

Attorney General Andrew Cuomo's federal antitrust case filed Wednesday alleges a longstanding symbiotic relationship between Intel and Dell.

advertisement

About Business Tech

Your destination for the latest news on enterprise-level information technology, from chip research and server design to software issues including programming, open source and patents.

Add this feed to your online news reader

Business Tech topics

advertisement
advertisement
Click Here

Inside CNET News

Scroll Left Scroll Right