Amazon.com is blaming the latest outage to hit its Elastic Compute Cloud service on a lightning strike at one of its data centers.
In a statement on the Amazon Web Services "health dashboard," the online retailer and cloud-computing provider addressed concerns from some U.S. customers whose EC2 service had been disrupted around 6:20 p.m. Pacific Daylight Time on Wednesday.
"A lightning storm caused damage to a single Power Distribution Unit (PDU) in a single Availability Zone. While most instances were unaffected, a set of racks does not currently have power, so the instances on those racks are down," the company said initially on the health dashboard.
The disruption lasted about seven hours, during which time Amazon asked any affected customers to use alternative parts of the network. "Users with affected instances can launch replacement instances in any of the U.S. Region Availability Zones or wait until their instance(s) are restored," Amazon said.
The company later attributed the outage to a problem on one "availability zone" and that the outage was localized. "We would like to reconfirm that this issue was limited to the single Availability Zone where this power issue occurred, and that a very small percentage of instances in that AZ were affected; this was not a generalized service issue," Amazon said.
Despite acknowledging that Amazon had dealt with the issue fairly efficiently, one user was concerned that a single lightning strike was able to bring down the service, if only in a limited way.
"I was under the impression that your architecture had more resiliency built into it. Yes we can use multiple availability zones to help with a single point of failure, but I thought that even within a single availability zone there was not a single point of failure for hardware/power," the user posted to an Amazon forum on the issue.
The EC2 service provides customers with virtual access to Amazon's computing infrastructure, using virtual machines that can be created using the Xen virtualization platform. First launched in a limited beta in August 2006, the EC2 service went fully live in October 2008.
Not including the latest issue, the service has suffered two major disruptions during that time in February 2008 and October 2007. In June 2008, Amazon's main retail site suffered an outage that the company blamed on the complexity of its own systems.
A series of outages that have hit other online or cloud computing services including Google's Gmail and other applications over recent months have led some critics to question whether the cloud approach to computing is really capable of providing the resilience required by enterprise users.
In mid-May, Google services were hit by an outage which apparently affected one in 10 of its users. In January, software-as-a-service pioneer Salesforce.com experienced an outage that disrupted all its customers for about an hour.
Andrew Donoghue of ZDNet UK reported from London.