News Blog

Read all 'outages' posts in News Blog
June 17, 2008 2:23 PM PDT

Google App Engine suffers outages

by Stephen Shankland
  • 4 comments

One advantage of cloud computing is that it's an expert's job to keep the centralized computing infrastructure up and running. But even experts have problems, and that's what's going on Tuesday with Google's App Engine.

The service has been having outages Tuesday, according to a mailing list posting Tuesday. App Engine, launched in April and still in "preview release" mode, is a service that lets people create interactive Web applications written in the Python programming language.

"We've experienced several outages during the past 12 hours, the most recent of which started at 6:30 a.m. PDT and is still ongoing. During these outages, a significant percentage of requests resulted in errors. The errors are related to usage of the Datastore," the note said. "We're working hard to determine the cause of these outages and will continue updating as we make progress."

Google didn't immediately respond to a request for comment about the issue.

Update 5:25 p.m. PDT: Google fixed the problem, according to an update notice Google pointed out to me.

"At around 1:40 p.m. we were able to isolate the issue, and requests are currently serving normally," the update said. "This outage was the result of a bug in our datastore servers and was triggered by a particular class of queries. We have isolated the bug and we're currently working on a fix. Going forward, we're also working to further isolate queries so that in the future a bug like this won't affect the stability of the system as a whole."

(Via TechCrunch.)

June 9, 2008 10:14 AM PDT

Another Amazon outage, this time hitting U.K, too

by Stephen Shankland
  • 13 comments

These availability charts from Keynote Systems show Amazon's U.K. site, top, dropping largely off the Net then crawling back. The U.S. site showed more intermittent problems.

These availability charts from Keynote Systems show Amazon's U.K. site, top, dropping largely off the Net, then gradually recovering. The U.S. site, the lower chart, showed more intermittent problems.

(Credit: Keynote Systems)

Amazon.com's Web site was offline again Monday, another significant interruption of services after a two-hour outage Friday.

As of 10:08 PDT on Friday, Amazon's main Web site showed the "Http/1.1 Service Unavailable" error message that also showed on Friday.

The e-commerce giant's Friday outage affected its Amazon.com site used by U.S. visitors. Monday's outage appeared to affect its U.K. site as well.

Pages on Amazon's U.S. and U.K. Web sites intermittently showed an error message like this Monday, as well as one saying Http/1.1 Service Unavailable.

Pages on Amazon's U.S. and U.K. Web sites sporadically showed an error message like this Monday, as well one saying Http/1.1 Service Unavailable.

(Credit: Amazon.com)

Update 10:26 a.m. PDT: Amazon.com is back, though the U.K. site still appears down to me. On Friday, the site was intermittently available, though, so it does not appear to be out of the woods yet.

Update 10:40 a.m. PDT: The "We're sorry!" error page that showed up Friday also is appearing on some other pages. The site is working for me, but not for an East Coast colleague.

Update 10:47 a.m. PDT: Amazon.co.uk now works for me again, though with sporadic errors on product pages.

Update 10:59 a.m. PDT: The company still hasn't responded to my requests for comment, but Amazon acknowledged problems on a forum for those who sell goods at the site: "We are currently experiencing an issue that is causing site performance issues. Our engineers are actively engaged on resolving this issue, and we will continue to provide updates until service has been restored," the company said.

Update 11:07 a.m. PDT: I'm getting intermittent errors again at the main pages, and some product pages of Amazon.com and Amazon.co.uk. So it's clear that as with Friday, recovery is a fits-and-starts affair, even an hour after the problem began.

Update 11:58 a.m. PDT: Amazon.com and most of Amazon.co.uk are working for me. One curiosity: when the site was really struggling, it was rare to even get the "Sorry!" error page.

Outages are bad, but as eBay learned nearly a decade ago, multiple outages are worse. Over its history so far, though, Amazon generally has a reputation for reliability.

I added a graph from GrabPerf that shows the recent errors and slow-response times of Amazon.com.

Update 12:28 p.m. PDT: Amazon has posted an "issue resolved" update to its seller community forum on Monday--but it's not about Monday's problem. Instead, it's just got old news, saying that on Friday, Amazon resolved the problem it was having on Friday. Still no word from the company about the second glitch.

Update 1:20 p.m. PDT: Keynote Systems, which monitors the availability of Web sites browsed from PCs and mobile devices, confirmed that Monday's outage hit the U.S. and U.K. sites.

The U.S. outage was a double whammy, said Shawn White, Keynote's director of external operations. The first problem showed from 10:03 a.m. to 10:23 a.m. PDT, with site availability dropping to about 30 percent. A second, less severe problem occurred from 10:56 a.m. to 11:09 a.m. PDT, he said.

The U.K. glitch was a single, longer-lasting outage that began at 10:06 a.m. and dropped the site to about 30 percent availability. The site gradually recovered over a period of about two hours to 50 percent, 70 percent, and now 98 percent.

As with Friday, White fingered human error as the most likely culprit, not a remote attack.

"It stills look like some type of user error or configuration glitch," he said. "The data just doesn't demonstrate any kind of network-level attack."

Update 1:40 p.m. PDT: Amazon confirmed the problem, though it didn't share much detail: "Some customers reported intermittent problems accessing Amazon retail Web sites on Monday morning. However, we are working to resolve the issues, and Amazon's Web services are not affected."

Update 2:50 p.m. PDT: A reader and I just got more timeouts on the U.S. site, with not even an error message showing. Things still aren't totally up to snuff, apparently.

Also, I added some nicer graphs from Keynote.

June 6, 2008 3:21 PM PDT

Amazon working again, but what went wrong?

by Stephen Shankland
  • 17 comments

Update 4:36 p.m. PDT with outside comment about possible causes of the Amazon.com outage.

Amazon posted an apology placeholder page for broken links.

Amazon posted an apology placeholder page for broken links during a two-hour outage.

(Credit: Amazon.com)

A two-hour Amazon.com outage is over. Now on to the post-mortem: what triggered the problem?

Amazon declared itself clear of the problem this afternoon. "The Amazon retail site was down for approximately two hours earlier today beginning around 10:25 a.m. The site (is) back up," the company said in statement.

But as to the explanation, the company only hinted that its complicated computing infrastructure was, unsurprisingly, a culprit.

"Amazon's systems are very complex and on rare occasions, despite our best efforts, they may experience problems. We work to minimize any disruption and to get the site back as quickly as possible," the company said, declining to comment further.

Human error?
The most likely culprit was simple human error, in the estimation of Shawn White, director of operations for Keynote Systems, which monitors Web site availability.

"Some engineer might have made a particular change, not knowing it could cause a trickle-down effect" that eventually brought down the site.

For example, he said, somebody in charge of maintenance might have been directing Internet traffic to a particular group of servers, but selected the wrong group.

But at Amazon? "What I find still so surprising is it happened in the middle of the day. Typically you do that in off-peak hours," White said. "They rank on the top with performance and availability, consistently, time and time again."

Network attack?
Another possible explanation is an attack such as the distributed denial-of-service (DDOS) attack that struck Amazon and other high-profile sites in 2000. White thinks it unlikely, though, that a crushing load of network traffic brought Amazon down.

"These guys are experts at dealing with flash floods of users," including those that routinely arrive during peak shopping days. "Usually, when you see a site going under because of traffic issues or a denial-of-service attack, you see a gradual slowdown in performance and drop in availability. Here we saw at 10:16 a.m. it completely dropped off 100 percent."

Soups Ranjan, a senior member of the technical staff of network protection and management company Narus, hasn't yet found any attack evidence.

"It doesn't seem to be the result of a network-initiated attack, at least from my preliminary analysis from our probes," Ranjan said.

Human error may not sound as gripping a tale as a network attack, but there's plenty of drama for the people responsible. And it's the career-limiting variety of drama, said Illuminata analyst Gordon Haff, who hazarded a guess that Amazon's problem involved its front-end Web servers.

The security group of WebSense, a Web site and communications protection company, also saw no evidence Amazon's problem was security related.

CNET staff writer Robert Vamosi contributed to this report.

April 20, 2008 8:31 AM PDT

Twitter hiccups through a semi-outage

by Jonathan Skillings
  • 3 comments

The Twitter community may not be paying much heed to your posts this weekend, but it's not your fault.

The company behind the messaging service and Web 2.0 darling acknowledged late Saturday that some back-end changes had been failing to show tweets from a number of people. A post on the Twitter site from a person identified as "goldtoe, Official Rep," had this to say:

This is a result of some of the caching changes we made during last night's maintenance window. While those changes have made the service more stable over all, there have been some unintended effects.

We're aware of the problem and are working through it. It may take a bit to resolve but we are on it.

Twitter outage (Credit: ParisLemon.com)

The post--which has a box noting that "106 people have this problem"--went on to say that "this solves the problem," but goldtoe some hours later updated things: "Just to be clear - we know the problem is not yet fixed. We still have work to do."

Some Twitter users on the forum indicated that they had been able to find workarounds, but others continued to report posts going MIA.

Outside Twitter itself, the outages got some attention from blogger MG Siegler on his ParisLemon site:

I noticed a few people thinking the same thing as me today: is everyone taking a break from Twitter? People do get burnt out from the web after all and it was a pretty nice weekend day in a lot of cities. But no, Twitter is broken. People are updating and most of the updates are simply not coming through.

Click on some of your friends' profiles. You'll see they have updates, yet those updates are probably not in your Twitter stream. But some are, making things even more confusing, and making it harder for people to tell that Twitter is broken.

April 2, 2008 3:44 PM PDT

Yahoo Mail outages due to maintenance

by Elinor Mills
  • 3 comments

Some Yahoo Mail users were unable to log in to their e-mail accounts on at least two separate occasions this week due to planned maintenance work, Yahoo says.

On Monday, Yahoo Mail was inaccessible to an unknown number of users as a result of an "unexpected issue" that arose during routine scheduled maintenance, the company says. The issue was resolved and by 9 a.m. Tuesday morning the outage was over.

Then, on Tuesday evening the mail service was unavailable for one hour and 40 minutes during a scheduled maintenance release coordinated with AT&T, according to Yahoo.

I'm kind of surprised that scheduled maintenance work would disrupt any user's access to Yahoo Mail and would think that a backup system would be employed. But, what do I know?

.
April 1, 2008 3:06 PM PDT

Glitch limits access to Citibank accounts

by Greg Sandoval
  • 1 comment

Citibank, the country's largest bank, saw intermittent outages to its Web site Tuesday that prevented an unknown number of customers from accessing their accounts.

"Earlier today we experienced an issue that has resulted in intermittent customer access to Citibank.com," the company said in a statement. "As we are addressing this issue, some users are experiencing slow response times. We hope to be operating normally shortly.

A customer service representative of the bank, a division of financial services powerhouse Citigroup, said the issue began this morning and that the glitch has prevented customers from paying bills or performing any banking chores.

A bank spokesman did not disclose what caused the malfunction.

March 26, 2008 11:41 AM PDT

Netflix says sorry with 5 percent credit

by Greg Sandoval
  • 10 comments

Netflix has extended an apology, in the form of a discount, to customers in the wake of an 11-hour site outage.

On Monday, the Web's leading movie-rental service suffered its second extended outage in the past nine months. This time, the glitch led to customers receiving their DVDs a day late. For those who were inconvenienced, Netflix is crediting their account 5 percent.

"We are sorry for any inconvenience this has caused," Netflix told customers via e-mail. "We will issue a (5 percent) credit to your account in the next few days."

Netflix has declined to say what caused the glitch, how many customers were affected, or what the total cost was to the company.

March 25, 2008 11:43 AM PDT

Netflix's minor glitches appear to be fixed

by Greg Sandoval
  • 2 comments

Update 1:33 p.m. PDT: Netflix has apparently fixed the site's recommendations and ratings.

Netflix customers saw only minor glitches a day after the movie rental service suffered an 11-hour Web site outage because of an undisclosed systems malfunction.

Customers were unable to access ratings and recommendations on Tuesday, according to Steve Swasey, a spokesman for the company. The company, however, appeared to have fixed the problems by the afternoon.

"This is part of the site that we haven't been able to get back online yet," Swasey said earlier in the day. "Otherwise the site is fully functioning. We're shipping and receiving."

That's in contrast to Monday, when the glitch hobbled Netflix's logistics and shipping systems as well as the Web site. The company was unable to fulfill all of the orders scheduled to go out.

On the blog Hackingnetflix.com, numerous people who posted to the board said they were informed by Netflix that shipments were held up a day but would resume on Tuesday.

Many of those who wrote described themselves as happy customers and said they weren't put out by the outage. But others couldn't understand why Netflix has been plagued by Web site trouble. In July, the movie rental company went down for 18 hours.

"They need a zero down-time system like any other decent company," said someone posting to Hackingnetflix.com who identified himself as Bagman. "It is absurd that their Web site shuts down for even a minute...Amazon and Google haven't been down 11 seconds, that I recall, let alone hours. Please get sorted, Netflix."

March 25, 2008 6:35 AM PDT

Red Sox fans freak over DirecTV outage

by Richard Defendorf
  • 13 comments

Updated 1:55 p.m. PDT with DirecTV's response.

DirecTV apparently had big trouble delivering ESPN2's coverage Tuesday morning of the season opener between the Boston Red Sox and Oakland A's, who are playing the game in Tokyo.

The number of reader comments to a 6:29 a.m. blog post about the outage by Amalie Benjamin, who covers the Red Sox for The Boston Globe, soared past 120 within a couple hours after the transmission failure began. The fans, naturally, are calling for congressional hearings on the matter.

While the problem seemed to have been remedied by 10 a.m. East Coast time, we were still waiting to hear from DirecTV about what exactly might have been the problem. It's probably no small comfort to Sox fans, though, that the team won 6-5 on a Manny Ramirez two-run double in the 10th.

DirecTV's response
In an e-mail, DirecTV's director of public relations, Robert Mercer, offered the company's apologies for the inconvenience, saying it was the result of "temporary technical difficulties" that did not affect the majority of channels and that have since been corrected.

In the case of the Red Sox game, any customers who have NESN or ESPN2 in HD were able to see the entire game. For customers who watch NESN in Standard Definition (SD), we were able to bring the channel back at the top of the seventh inning. For customers who watch ESPN2 in SD the channel came back on later, after the game was over.

Replays on both ESPN2 and NESN were planned for Tuesday afternoon Eastern time.

March 24, 2008 3:46 PM PDT

Netflix glitch to delay deliveries

by Greg Sandoval
  • 21 comments

Update at 6:15 p.m. PDT to add areas that may likely see delays in delivery.

Update at 7:55 p.m. to reflect that the site has since come back online.

Netflix customers expecting a little red package soon may be disappointed.

The largest online video-rental service has suffered a technical glitch that has knocked out its Web site as well as its logistics and delivery systems, according to a Steve Swasey, a company spokesman.

The malfunction, the source of which the company won't reveal, began at about 7 a.m. PDT. The site came back online about 12 hours later, but the malfunction caused Netflix to miss the deadline to mail a large number of shipments scheduled to go out on Monday--affecting customers across the United States, according to Swasey. "We did send some shipments, but most of them will go out on Tuesday."

Swasey declined to specify what percentage of the company's more than 7.5 million customers would be affected.

The blackout was the second longest in company history. In July, Netflix suffered an outage that lasted longer than 18 hours. On that day, the company's shares fell 7 percent as the market punished Netflix for a drop in customers.

This time, the glitch came as Netflix's customer numbers are on the rise and its stock is soaring. Stock analysts upgraded Netflix on Monday, and the company closed trading at $38.18, up 5 percent. Over the past six months, the company's shares have doubled in value.

One of the differences between the two outages is that Netflix's logistics and shipping systems were not affected in July. With the more recent glitch, Netflix continued to ship DVDs but that changed sometime Monday afternoon.

(Credit: Screenshot of Netflix HTML source featuring deleted sentence)

In a message posted to its site, Netflix told customers not to worry because the company's "distribution centers are still sending and receiving DVDs." A check of the site's HTML source showed that the company rendered that sentence invisible sometime later.

"Our engineers have been feverishly working on repairing the problem all morning," Swasey said. "It was an unanticipated, unplanned outage, and we apologize to our customers."

Site outages are typically not a big deal, and any company can suffer one. But a blackout that lasts for more than an hour is rare, and one spanning several hours is rarer still.

Netflix, which has 7 million subscribers, said that customers needn't worry about their stored movie picks. None of their information will be lost.

advertisement

15 sites that went kaput in 2009

Web sites launch all the time, but they also shut their doors. We highlight 15 that bit the dust this year.

Top 10 news stories of the decade

Let the debate begin: Was the iPhone more important than iTunes? Was anything bigger than Google finding a great business model? CNET offers its list of the 10 most important stories of the '00s.

About News Blog

Recent posts on technology, trends, and more.

Add this feed to your online news reader



advertisement

Inside CNET News

Scroll Left Scroll Right