Nice financial results notwithstanding, Google had some trouble late today with its App Engine service for online applications.
Overall availability of the cloud-computing foundation, which can run online applications written in Python or Java, dropped at least as low as 88.1 percent, according to the Google App Engine status dashboard.
The dashboard showed problems serving Java programs--more than half of attempts to request an app's Web page resources produced errors at one point. There also were delays in using a programming interface to manage tasks.
In Google's postmortem analysis, the company determined that there was an overall error rate of 1.9 percent for App Engine, affecting only 0.005 percent of traffic to the service at the problem's peak.
The problems appeared not to be permanent, though, and Python applications didn't show the same troubles. According to the dashboard, problems began after 9 p.m. PT and were back to normal at about 11 p.m. The problems began earlier at least for some, though, starting with a maintenance outage several hours earlier.
"We were taken down for several hours tonight by this," said Jason Cahill, a You vs. the Internet engineer who works on the Wordament game. "We had a lot of unhappy users and we felt powerless to do anything about it."
After the problem faded, so did two question-mark icons that had been on the App Engine status dashboard indicating that Google was investigating what was going on.
After CNET asked why the dashboard didn't reflect the problem, Google apparently had second thoughts. "The team is investigating the cause of the issue last night. The status dashboard has been updated to reflect a service outage," Google said in a statement.
Cahill had this description of the outage:
We've been available to customers since April 1, and we've only ever had one outage before--which was planned weeks in advance by Google and on their blog. This time, there was almost no warning at all: They posted less than 3 hours before going offline. And, while their scheduled downtime was only supposed to be for 1 hour, they took us out from 5 p.m. until 11 p.m. PST. Our service was limping to life and then falling over for some of it; and then completely dead from 9 p.m. until 11. Even with our best attempts to harden our app for downtime, this was producing errors unlike we've ever seen or expected. The whole thing was handled very poorly by Google.
He wasn't the only one to notice the problem: "Microsoft very graciously reached out to us tonight and offered to help us move to Azure. Since we are Windows Phone exclusive today, they are extra interested in helping us succeed," Cahill said. "There's no way we could have gotten this big and successful without a cloud platform, but this event made us think twice about 'cloud redundancy.'"
However, Google said, App Engine served most applications normally again after the initial, scheduled maintenance period ended and before the later unplanned problem. And the company did publish a warning three days ahead of time, not just immediately beforehand.
App Engine can host applications written in Java, Python, and later, Google's own Go programming language. These programs can tap into Google services such as online storage.
Its services are intermediate between Amazon Web Services, which provides a lower-level interface to computing nuts and bolts, and some higher-level services such as Google Docs or Spotify, which provide finished products that consumers can use over the Net.
Moving applications to the Net can cause problems, as shown by a massive, long Amazon Web Services outage in April. But the pitfalls of cloud computing must be balanced against the convenience and power--App Engine, AWS, and other services let customers ramp up operations very quickly and spend money only for resources used--and against the expenses and risks of running in-house network services.
Google didn't immediately respond to a request for comment on the problems.
Updated 1:18 a.m. PT and 12:28 p.m. PT with comment from You vs. the Internet and from Google.
Updated 4:46 a.m. PT September 16 with further comment from Google that most applications worked between the initial scheduled maintenance outage and the later unplanned outage and with details from the postmortem analysis.