The Amazon Web Services outage in April created uncertainty about the reliability of the cloud. And while the outage may have cast doubt about its ability to handle mission-critical applications, Xeround CEO Razi Sharir, whose MySQL cloud database-as-a-service goes into general availability today, sees the event as an opportunity to better understand the unique characteristics of the cloud environment--namely availability and elasticity.
Via an e-mail interview, Sharir maintained that high availability in the cloud is different from maintaining availability in a traditional data center because there is limited control over the cloud infrastructure. He notes that in the cloud, high availability isn't just about hardware resiliency--rather it depends on having "more of the same" resources, and the ability to dynamically provision them across any and all data centers/configurations so that when a machine fails, you're able to maintain service.
Availability is an issue even in an on-premise datacenter. While the cloud introduces an extra level of separation between the organization and the infrastructure that does creates some unique challenges - and should be properly addressed when designing an application to run on a cloud stack--it's no different than addressing availability as an important key performance indicator on any other stack/environment.
Sharir advises that when migrating to the cloud environment, one should assume that server crashes, hardware malfunctions, and other failures are to be expected and that the key is to anticipate these issues and be able to address them in a way that is both immediate and transparent to the application. Ideally, handling the problem should be as automated as possible so database administrators don't need to be on-call, 24/7 to prepare for the next failure.
According to Sharir, the main promise of cloud computing is elasticity. While scaling an application is fairly easy to accomplish, scaling a database in a virtualized cloud is a different story.
Scaling a database typically entails adding throughput capability. For database reads this can be done by adding additional read replicas (which requires application adjustments for every change). For writes, it gets tricky: as most databases are limited to only one node to perform the writes, a larger, stronger node is required once that node is exhausted. When the peak is over and the traffic goes back to "normal," the whole process needs to be reversed.
True elasticity is a task that not only shouldn't require downtime but also is automated and transparent to the application, meaning no code or architectural changes are required. Elasticity needs to be unlimited--it needs to scale up or out, down or in. Addition or removal of both size and throughput as required by the application should be seamless and very granular.
One of the promises of the cloud is that users are supposed to be able to do away with talk of "servers" or "instance size." Optimal resources as required by the database should be available at any given moment, so you no longer need to over-provision (and over-pay) to anticipate the peak, or stumble to scale in a hurry when you see the pleasant surprise of unexpected growth.
Despite the noise, it remains to be seen the extent to which the cloud permeates or replaces the physical data center (though it goes without saying that many are exuberantly bullish). Perhaps with a better understanding of how cloud databases should work more organizations will be willing to make that leap with mission critical applications.