Amazon has posted an apology for a disruption in its cloud services that hobbled Netflix's streaming services on Christmas Eve and said it is taking steps to prevent future disruptions.
Although Amazon's explanation of the outage didn't identify Netflix by name, the streaming service blamed Amazon Web Services last week for an outage that affected "many but not all devices" across the Americas.
"We want to apologize," Seattle-based Amazon said in its statement. "We know how critical our services are to our customers' businesses, and we know this disruption came at an inopportune time for some of our customers."
Amazon Web Services runs more than 835,000 requests per second for hundreds of thousands of customers in 190 countries, including 300 government agencies and 1,500 educational institutions. Amazon says it is "committed" to a 99.95 percent uptime, but frequent disruptions in service could undermine customer confidence in the platform, which offers remote Web-based computing services.
The disruption began shortly after noon Pacific time on December 24 when data was accidentally deleted by a developer during maintenance on the East Coast Elastic Load Balancing system, which is designed to distribute traffic volume among servers. The normal service was restored by 10:30 a.m. PT on December 25, but at the height of the disruption, 6.8 percent of ELBs were impacted, the company said.
As a result of the disruption, Amazon said it has adopted procedures that requires developers to obtain specific Change Management approval from Amazon Web Services for each time they access the Elastic Load Balancing system.
"We will do everything we can to learn from this event and use it to drive further improvements in the ELB service," the company said.
Netflix responded to Amazon's apology by saying it was proud of its streaming service's record of staying up even when other AWS-hosted services were down but added that it's working to improve its own line of defense.
"Netflix is designed to handle failure of all or part of a single availability zone in a region as we run across three zones and operate with no loss of functionality on two," the company said in a blog post this afternoon. "We are working on ways of extending our resiliency to handle partial or complete regional outages."