Amazon.com is blaming the latest outage to hit its Elastic Compute Cloud service on a lightning strike at one of its data centers.
In a statement on the Amazon Web Services "health dashboard," the online retailer and cloud-computing provider addressed concerns from some U.S. customers whose EC2 service had been disrupted around 6:20 p.m. Pacific Daylight Time on Wednesday.
"A lightning storm caused damage to a single Power Distribution Unit (PDU) in a single Availability Zone. While most instances were unaffected, a set of racks does not currently have power, so the instances on those racks are down," the company said initially on the health dashboard.
The disruption lasted about seven hours, during which time Amazon asked any affected customers to use alternative parts of the network. "Users with affected instances can launch replacement instances in any of the U.S. Region Availability Zones or wait until their instance(s) are restored," Amazon said.
The company later attributed the outage to a problem on one "availability zone" and that the outage was localized. "We would like to reconfirm that this issue was limited to the single Availability Zone where this power issue occurred, and that a very small percentage of instances in that AZ were affected; this was not a generalized service issue," Amazon said.
Despite acknowledging that Amazon had dealt with the issue fairly efficiently, one user was concerned that a single lightning strike was able to bring down the service, if only in a limited way.
"I was under the impression that your architecture had more resiliency built into it. Yes we can use multiple availability zones to help with a single point of failure, but I thought that even within a single availability zone there was not a single point of failure for hardware/power," the user posted to an Amazon forum on the issue.
The EC2 service provides customers with virtual access to Amazon's computing infrastructure, using virtual machines that can be created using the Xen virtualization platform. First launched in a limited beta in August 2006, the EC2 service went fully live in October 2008.
Not including the latest issue, the service has suffered two major disruptions during that time in February 2008 and October 2007. In June 2008, Amazon's main retail site suffered an outage that the company blamed on the complexity of its own systems.
A series of outages that have hit other online or cloud computing services including Google's Gmail and other applications over recent months have led some critics to question whether the cloud approach to computing is really capable of providing the resilience required by enterprise users.
In mid-May, Google services were hit by an outage which apparently affected one in 10 of its users. In January, software-as-a-service pioneer Salesforce.com experienced an outage that disrupted all its customers for about an hour.
Here is the string of messages from Amazon about this week's outage.
(Credit: Amazon.com)Andrew Donoghue of ZDNet UK reported from London.
Amazon.com's cloud-computing arm has added new features to help users monitor cloud resources, adjust capacity, and balance traffic loads.
In an announcement Monday, Amazon Web Services unveiled a public beta of the three new features: the CloudWatch monitoring service, Auto Scaling for on-demand capacity adjustments, and Elastic Load Balancing for redistributing traffic.
The new features are available immediately to users in the U.S., according to a company blog, with availability in Europe set to follow in the next few months.
"You can use these services to make your...applications perform better without sacrificing application control, freedom of development, choice of tools, speed of deployment, or any other kind of flexibility," according to the blog post.
For the past three years, Amazon Web Services has been offering on-demand computing and storage through its Elastic Cloud Compute service, known as EC2, and its Simple Storage Service, known as S3. The company says it deals with 80,000 work requests per second and stores 52 billion objects.
Toby Wolpe of ZDNet UK reported from London.
This was originally posted at ZDNet's Between the Lines.
A correction has been made to this story. See details below.
Amazon on Thursday announced a new cloud computing service that uses Hadoop, a free software framework, to crunch tons of data.
The service, called Amazon Elastic MapReduce, is designed for businesses, researchers and analysts trying to conduct data intensive number crunching (statement). Hadoop, which is used by companies like Yahoo, is trying to be pushed into the enterprise data center by start-ups like Cloudera.
Correction, 7:15 a.m. PDT: This story initially miscast Google's connection to Hadoop. Google invented and uses the MapReduce technology, but it doesn't use Hadoop, an open-source implementation of MapReduce. At least it doesn't use it broadly. It has its own in-house version.
Amazon's Hadoop framework runs on the company's Elastic Compute Cloud (EC2) and Simple Storage Service (S3). The general idea is that customers can use MapReduce to pay by the sip as they do things like index the Web, mine data, conduct financial analysis, simulation and bioinformatics research.
In its statement, Amazon said:
Amazon Elastic MapReduce creates data processing job flows that are executed by Hadoop software on the web-scale infrastructure of Amazon EC2. The service automatically launches and configures the number and type of Amazon EC2 instances specified by customers. It then kicks off a Hadoop implementation of the MapReduce programming model, which loads large amounts of user input data from Amazon S3 and then subdivides it for parallel processing using Amazon EC2 instances. As processing completes, data is re-combined and reduced into a final solution, and the results deposited back into Amazon S3. Users can configure, manipulate, and monitor job flows through web service APIs or via the AWS Management Console.
That roughly translates to: Bring your data mining to us.
MapReduce is a separate service and here's the pricing in the U.S.:
(Credit:
Larry Dignan/ZDNet)
This was originally published at ZDNet's Between the Lines.
Amazon has tweaked its Elastic Compute Cloud (EC2) pricing model to be more enterprise-friendly. The move is significant enough to sway IT executives to adopt more of Amazon's Web Services-especially when they have tight budgets.
Amazon on Thursday announced reserved pricing for its EC2 instances. Simply put, customers can reserve instances for one-year and three-year terms as if they owned the hardware. Enterprises can guarantee they have an EC2 instance for computing power they know they'll use and buy on the spot market to account for spikes at the usual Amazon rate. Under Amazon's model, customers only pay for the computing power they use even if instances are reserved.
Enterprises have begun to use Amazon's Web services more in recent months and the economic downturn has only accelerated those moves. However, enterprises view the Amazon EC2 as a way to add capacity for spikes and pilots. These corporations haven't been swayed to put more of their infrastructure on Amazon's platform because they are still evaluating the feasibility and return on investment.
Peter DeSantis, general manager of Amazon's EC2 business, said that the company's latest pricing may help the return on investment case for enterprises. By reserving instances under 1-year terms the savings are roughly 30 percent for customers. A 3-year term adds up to be about a 50 percent savings.
"We had a lot of the feedback from enterprise customers. They want to know that their data centers have boxes when they need them," explained DeSantis, who noted that there is a big menu of pricing options. "These broad options for enterprises appeal to what they are used to. The pricing model resonates to them. unanticipated needs. Best of both worlds."
In practice, technology executives can now have the comfort of knowing they have a set number of instances reserved without the added investment of adding upfront capacity.
Here's a look at the reserved vs. spot market EC2 pricing:
(Credit:
ZDNet)
Depending on how enterprises mix and match these pricing alternatives, the savings can add up. A Linux/Unix box paid for hourly will run you 40 cents an hour for a standard large on-demand instance. That equates to about $3,500, assuming 24/7/365 coverage. If you reserve that instance over a year term Amazon will charge you $1,300 per instance plus the 12 cents per hour. That comes out to $2,351. Multiply those savings ($1,152) over 100 instances and you get some real savings: $115,000 a year or so.
A three-year option under the same assumptions yields an on-demand price of $3,500 a year ($10,500 over three years). A reserved instance is $2,000 plus 12 cents an hour. With around the clock coverage you get a price of $5,153 per instance over three years. Savings of on-demand compared to reserved per instance: $5,346. Across 100 instances, the savings comes out to $534,600 over 3 years.
Add it up and it's pretty clear that Amazon is using the downturn to be quite disruptive to entice enterprise customers. Recessions force enterprise buyers to get creative, and cloud computing is already on the radar. Even so, Forrester reports that only 5 percent of large enterprises have implemented cloud computing instances (but 46 percent are interested). Part of the problem is that the cloud model is fairly new and there are some uncertainties such as licensing, pricing, and the fact that services like Amazon EC2 are relatively new.
However, that's changing. Amazon has partnered with almost all of the big enterprise players--IBM, Capgemini, Salesforce.com, Sun's MySQL, and OpenSolaris, Oracle and Red Hat--and clearly is a player in the corporate cloud computing plug.
Meanwhile, Amazon is taking its approach to e-tailing--customers first, adjust to meet needs and keep innovation--to the enterprise market. As a result, Amazon is taking away the excuses to not try its cloud services.
A central part of Amazon's online computing foundation is growing up.
The Elastic Compute Cloud, a service that gives customers on-demand access to Linux servers, is now out of beta testing, said Jeff Barr, evangelist for the collection of online options collectively called Amazon Web Services.
"Amazon EC2 is now in full production," Barr said in a blog post Thursday. And as promised, EC2 now offers Windows in a beta test, joining Sun Microsystems' OpenSolaris and Solaris Express Community Edition.
Along with those moves, EC2 now comes with a service level agreement, a formal commitment that the service will be available at least 99.95 percent of the time. This type of agreement makes it easier for businesses to place faith in the service. Previously, only the only AWS component with a service level agreement was the Simple Storage Service (S3), which provides online data storage.
Customers pay for AWS according to how much they need: more servers, more storage space, and more network capacity means more charges. But unlike with computing infrastructure built in-house, when customers don't need it anymore, they can stop paying for it. AWS has had outages, but it continues to gain in popularity, and Amazon has been lowering some AWS prices.
Amazon collects multiple gigabits of monitoring data each second for its Elastic Compute Cloud servce.
(Credit: Amazon.com)Barr also described features that signal growing sophistication for AWS overall in 2009 that should make it easier to administer AWS--either manually or by letting it run itself better. Barr listed four areas:
Management Console: The management console will simplify the process of configuring and operating your applications in the AWS cloud. You'll be able to get a global picture of your cloud computing environment using a point-and-click web interface.
Load Balancing: The load-balancing service will allow you to balance incoming requests and traffic across multiple EC2 instances.
Automatic Scaling: The auto-scaling service will allow you to grow and shrink your usage of EC2 capacity on demand based on application requirements.
Cloud Monitoring: The cloud-monitoring service will provide real time, multidimensional monitoring of host resources across any number of EC2 instances, with the ability to aggregate operational metrics across instances, Availability Zones, and time slots.
In a separate blog post, Amazon Chief Technology Officer Werner Vogel described some of Amazon's work in ensuring reliability and efficiency.
"We relentlessly measure every possible resource usage parameter, every application counter, and every customer's experience. Many gigabits per second of monitoring data flows continuously through the Amazon networks to make sure that our customers are getting serviced at the levels they can expect and at an efficiency level the business desires," Vogel said.
Among the customers using the Windows version of EC2 are Autodesk, RenderRocket, and Eli Lilly, Amazon said.
"This is a huge step forward in maximizing our results relative to IT spend, and now that Amazon EC2 runs Windows and SQL Server, we have even greater flexibility in the kinds of applications we can build in the AWS cloud," Dave Powers, an Eli Lilly associate information consultant who uses the service to process research data, gushed in a statement.
Autodesk uses EC2 for back-end data processing tasks, said Mike Haley, a senior architect of search engineering, and RenderRocket uses the service for 3D film and TV graphics work for TV and movies, Amazon said.
Customers affected by Sunday's outage of Amazon's Simple Storage Service, an online data storage plan, won't have to do anything to get credit for the hours-long glitch.
Some Amazon Web Services were down for hours on July 20.
(Credit: Amazon)"We'll be announcing on the developer forum momentarily that we'll be waiving our standard SLA (service-level agreement) process and applying the appropriate service credit to all affected customers for the July billing period," the company said Monday evening in a statement about the S3 outage. "Customers will not need to send us an e-mail to request their credits, as these will be automatically applied. This transaction will be reflected in our customers' August billing statements."
S3 provides an online mechanism where customers can pay to store data according to the amount they need stored. It's one of a host of Amazon Web Services, but it's the only one so far covered by a service-level agreement that promises high reliability.
Amazon's S3 and the Elastic Compute Cloud (EC2) are two of prominent examples of the concept of cloud computing, in which specialists offer online services on which others can base their own applications. Another variety of cloud computing offers more specific services such as online e-mail or office suites from Zoho, Google, Adobe, and Yahoo.
Amazon.com's Simple Storage Service, S3, spent a few hours Sunday in a big pothole on the road to the glorious cloud computing future, with an outage taking the storage system offline for several hours Sunday. Should we be surprised?
No. In short, the computing industry is making up what's called cloud computing as it goes along, often with a server and networking architecture that's one part improvisation to two parts proven best practice. Frankly, it's notable to me that some services are as reliable as they are.
Some Amazon Web Services were down for hours on July 20.
(Credit: Amazon)Computing practices tend to gravitate toward one of two poles. One is tight control, higher prices, and high reliability. The other is openness, lower cost, but some degree of flakiness. High-end mainframes and Unix servers can handle transaction loads that would crush most machines using Intel or AMD x86 processors, but they cost more and are less adaptable. Most of the cutting-edge, large-scale action in the Internet--including various cloud computing efforts--is happening with the more free-wheeling technology.
One company operating at colossal scale, Google, has concluded it's better to buy cheap x86 servers and write software that automatically paves over hardware failures. The bigger problem comes when a large system composed of many interacting components loses track of its self-conception, and rebooting a single system or swapping out a hard drive isn't sufficient.
Essentially, Amazon had to reboot S3. Here's how the company described its S3 problem in a statement:
"As a distributed system, the different components of S3 need to be aware of the state of each other. For example, this awareness makes it possible for the system to decide which redundant physical storage server to route a request to. We experienced a problem with those internal system communications, leaving the components unable to interact properly, and customers unable to successfully process requests. After exploring several alternatives, the team determined it had to take the service offline to restore proper communication and then bring service online again. These are sophisticated systems and it generally takes a while to get to root cause in such a situation," Amazon said. "We will be providing our customers with more information when we've fully investigated the incident."
Afterward, Om Malik called cloud computing frail: "The S3 outage points to a bigger (and a larger) issue: the cloud has many points of failure--routers crashing, cable getting accidentally cut, load balancers getting misconfigured, or simply bad code. And he's right, to a degree, but there are three things that shouldn't be overlooked before writing cloud computing off as a failure.
First, you should compare the problems of cloud computing to the alternatives, including running computing services in-house. Last I checked, corporate data centers also have crashing routers, bad code, and misconfigured load balancers.
Second, you can expect reliability to increase as the companies providing cloud infrastructure and services figure out explore the terra igcognita.
Third, don't confuse Web 2.0 with the foundational elements of cloud computing. A Web site that uses an online application at another site to mash up data from some other sites then present it using a service from yet another site is indeed susceptible to numerous points of failure. But a single-purpose infrastructure such as Amazon S3 is at least in theory a more tightly controlled, single-purpose utility that can offer higher reliability.
That's not to excuse Amazon's outage or gloss over the effect it had on business partners reliant on it. After all, S3 is the sole part of Amazon Web Services that comes with a service level agreement to promise customers reliability.
But a little silver lining to this particular cloud problem is that Amazon is setting expectations at the right level: They said in a statement, "Any downtime is unacceptable, and we won't be satisfied until it is perfect."
- prev
- 1
- next





