Expanding its cloud-computing storage services to a higher level, Amazon.com unveiled a new option called Amazon RDS for companies that want to store information in a database on the other side of the Internet.
The suite of Amazon Web Services (AWS) already included a database option called SimpleDB, a basic database with its own interface standard for storing data and retrieving it. The Amazon Relational Database Service, in contrast, uses a more standard database interface, embodied in this case in an online implementation of the open-source MySQL software, the company said Monday.
"With Amazon RDS, you get full native access to a MySQL database," specifically, version 5.1 of the Sun Microsystems technology, the company said on its Amazon RDS site. "This means Amazon RDS works with your existing tools, applications, and drivers. You can port an existing database to Amazon RDS without changing a line of code--just point your tools or applications at your Amazon RDS DB instance, and you are ready to go."
Amazon raised minimized hassle and increased flexibility as reasons to use the service, which is currently in beta testing.
"Every hour that you don't spend fiddling with hardware, tracing cables, installing operating systems, or managing databases is an hour that you can spend on the unique and value-added aspects of your application," Jeff Barr, the company's Web services evangelist, said in a blog post. "I should point out that RDS enables a lot of really enticing development and test scenarios. You can set up a separate database instance for each developer on a project without making a big investment in hardware."
With its years-long effort, the Net retailer has built Amazon Web Services into a formidable presence in the information technology world. Competitors include Google App Engine, a computing foundation that can run Java or Python programs on Google's own BigTable database technology, and Microsoft's Azure, which is set to offer access to Windows servers in the cloud when it formally launches in November.
One potentially interesting rival is Oracle, already a giant in the database market and, if it can overcome European regulatory concerns, the future owner of MySQL assets. Because MySQL is open-source software, though, anyone may use and modify it, even without its copyright holders' permission.
The biggest competitor to this model is doing things the old way, with companies running their own computing infrastructure. Cloud computing poses security and trust issues for many companies considering whether to put their data and business applications on somebody else's computer systems. But researchers such as Gartner, an influential but not radical analyst firm, now recommend that companies look seriously at cloud computing.
Amazon is working on greater robustness for Amazon RDS. It offers automated backup, and it later plans to offer a "high-availability" option at no extra charge, with which customers can create a separate instance of a database in a different geographic region.
As with all services on AWS, Amazon RDS is priced on an as-used basis--with per-hour charges according to the server memory requirements of the database: 11 cents per hour for a small database of 1.7GB of RAM; 44 cents for large, or 7.5GB; 88 cents for extra-large, or 15GB; $1.55 for double extra-large, or 34GB; and $3.10 for quadruple extra-large, or 68GB. There also are charges for the size of data stored, the number of input-output requests, the amount of data written to the database, and the amount of data read from the database.
The debate about the validity of internal cloud implementations has raged on for some time now, with some claiming that cloud computing and wholly owned infrastructure don't mix, and others pointing out that applying "on demand," "at scale," and "multitennant" to enterprise IT data centers offers unique advantages to those who have already made that investment. It has been difficult, however, to do an objective comparison of the two approaches--until now.
The announcement on Thursday of Amazon's new Hadoop-based Elastic MapReduce service, combined with the introduction of a commercial Hadoop distribution from start-up Cloudera, means that we finally have a reasonable means of watching which directions enterprise IT prefers. Let me explain.
Amazon's service is a simplified, prepackaged Hadoop implementation that can be leveraged by anyone with an Amazon account. The Amazon Web Services blog describes it as follows:
Today we are rolling out Amazon Elastic MapReduce. Using Elastic MapReduce, you can create, run, monitor, and control Hadoop jobs with point-and-click ease.
You don't have to go out and buys scads of hardware. You don't have to rack it, network it, or administer it. You don't have to worry about running out of resources or sharing them with other members of your organization. You don't have to monitor it, tune it, or spend time upgrading the system or application software on it.
You can run world-scale jobs anytime you would like, while remaining focused on your results. Note that I said jobs (plural), not job. Subject to the number of EC2 (Elastic Compute Cloud) instances you are allowed to run, you can start up any number of MapReduce jobs in parallel. You can always request an additional allocation of EC2 instances here.
Processing in Elastic MapReduce is centered around the concept of a Job Flow. Each Job Flow can contain one or more steps. Each step inhales a bunch of data from Amazon S3, distributes it to a specified number of EC2 instances running Hadoop (spinning up the instances if necessary), does all of the work, and then writes the results back to S3.
Each step must reference application-specific "mapper" and/or "reducer" code (Java JARs or scripting code for use via the Streaming model). We've also included the Aggregate Package with built-in support for a number of common operations such as Sum, Min, Max, Histogram, and Count. You can get a lot done before you even start to write code!
Cloudera, on the other hand, provides a Hadoop build that you can deploy wherever you wish:
Cloudera's Distribution for Hadoop is based on the most recent stable version of Apache Hadoop. It includes some useful patches back-ported from future releases, as well as improvements we have developed for our support customers.
Cloudera's Distribution includes everything you need to configure and deploy Hadoop using standard Linux system administration tools.
Here's what I'm thinking: enterprise IT is looking at an entirely new class of applications that take advantage of MapReduce to process very large sets of both structured and unstructured data for things like predictive analysis, sorting/sequencing, and data mining. Both commercial Hadoop offerings meet the demand for a platform to simplify the development and operation of these applications. The primary difference is the where, not so much the what.
That is exactly what will make the competition between the two offerings so compelling to watch. Let me break it down for you:
Will the requirement to own and operate hardware work against Cloudera? What makes the Amazon offering so groundbreaking (and it will prove to be historic, in my opinion) is that it is now possible for anyone with a need to analyze large data sets to do so simply for the cost of data storage plus processing time. (Note that the use of Elastic MapReduce adds a nominal cost to the server instances that host the instances.)
Where "grid computing" was once the playground of large enterprises and academic institutions that could afford the hardware to justify the cost of building them out, Amazon makes it possible for even individuals to run such jobs for a few tens or hundreds of dollars.
Cloudera, on the other hand, requires that the hardware be available to install it on. That either means existing server capacity, new hardware (which greatly adds to the cost, and can only be justified for continuous Hadoop use), or leased capacity. The latter starts to look a lot like Amazon's service.
Will Amazon's requirements to use S3 work against it? There are three reasons why I see it might:
- The commonly cited concern about data security outside of corporate firewalls. (Even if the perception is wrong, the perception exists.)
- The cost of data transfer to and from the S3 service--currently as high as 17 cents per gigabyte a month.
- The cost of storage of both the raw data and the aggregate results--currently as high as 15 cents per gigabyte a month.
It should be rightly noted that if you already rely on S3 to store your data sets to be processed, this is a great deal. However, if you have to upload terabytes or even petabytes of data to be combed through by MapReduce, then this could get quite pricey on its own, and existing infrastructure might serve the purpose well. If you are going to leave the data up there permanently--and update it regularly--the cost of Amazon's service should be weighed against the cost of owning and operating that storage yourself in your existing facilities.
Will the so-called "barrier of exit" stand up? I'm not even arguing that the choice will be based solely on the comparative costs to the business. In fact, what I am interested in is the extent to which business units and departments will simply bypass IT altogether to build and run their own jobs in Amazon Elastic MapReduce.
If IT maintains a valuable service using existing facilities and computing investments, then Cloudera will likely do fine. If not, then Amazon stands to be the overwhelmingly dominant commercial Hadoop implementation.
I should also note that running a Hadoop instance is not the same thing as cloud computing in and of itself. An internal Cloudera implementation is not necessarily an internal cloud, though if operated "on demand," "at scale," and with multitenancy, it certainly qualifies as a cloud.
I will be watching this space closely for the next year or two. I have a feeling that Amazon will do fine, regardless, as there are many possible implementations that would benefit from a completely public cloud implementation. The real test is probably how much opportunity Cloudera finds within enterprise data centers.
Cloudera also has much more competition from the free downloads of Hadoop than Amazon has, in my opinion, as it faces a more traditional open-source competitive landscape.
Is your company looking at MapReduce for a new generation of data-mining applications? If so, what will you choose: the public, external cloud implementation of Hadoop from Amazon Web Services, or the wholly owned, internal implementation of the same from Cloudera?
Google App Engine is growing a step more mature, with Google planning on Tuesday to begin allowing people using the cloud-computing foundation to pay for heavy use.
When Google launched App Engine last April, it was available only as a free service with caps on computing and network resource usage. Free use is still available for lower-traffic sites, but Google now lets users pay for higher access as needed.
"It's been one of our biggest developer requests," said Pete Koomen, Google App Engine product manager.
The billing feature makes Google App Engine useful for those who want to run real applications on the site, not just kick the tires, as long as they're willing to pay and to put up with the continued "preview release" status. However, the service hasn't even attained "beta" level, much less a service level agreement (SLA) that promises refunds if the service goes down for too long.
Google offers such an agreement for its Google Apps online tools. "It is something we are exploring" for Google App Engine, spokesman Jon Murchinson said.
Google App Engine competes with various other cloud-computing efforts, including Amazon's lower-level suite of Web services components, but mostly with the alternative of hosting applications on one's own equipment. Amazon Web services also uses a pay-as-you-go pricing model.
Here's Google's description of how billing will work:
$0.10 per CPU core hour. This covers the actual CPU time an application uses to process a given request, as well as the CPU used for any Datastore usage.
$0.10 per GB bandwidth incoming, $0.12 per GB bandwidth outgoing. This covers traffic directly to/from users, traffic between the app and any external servers accessed using the URLFetch API, and data sent via the Email API.
$0.15 per GB of data stored by the application per month.
$0.0001 per email recipient for emails sent by the application
Koomen wouldn't comment on the matter, so you'll have to decide for yourself whether Google is trying to set prices low to attract users, medium to cover expenses, or high to generate revenue during Google's new era of financial discipline.
App Engine is designed to run Web applications written in the Python programming language, though Google plans to add other language support in the future. One of its chief selling points is that it's built on Google's computing infrastructure, letting applications rapidly scale if demand for them spikes without the organization running the application having to scare up a large number of new servers and network capacity.
In conjunction with its S3 storage offering and other Web Services products, ever-expanding Web giant Amazon has launched a beta version of a content delivery network called CloudFront.
The service, which promises "low latency, high data transfer speeds, and no commitments," uses a global network of edge locations to keep the system humming.
Amazon announced in September its intentions to launch a CDN, with a target date of the end of 2008. It also made clear then that pricing would be consumption-based. Amazon has declared that there is "no minimum fee" for CloudFront; customers pay only for what they use.
There are loads of CDNs out there: it's an on-demand, business-focused offering for which companies are willing to pay good money. But because Amazon already has a big grip on the cloud with its existing Simple Storage Service, or S3, CloudFront is likely to be a power player from the start.
A central part of Amazon's online computing foundation is growing up.
The Elastic Compute Cloud, a service that gives customers on-demand access to Linux servers, is now out of beta testing, said Jeff Barr, evangelist for the collection of online options collectively called Amazon Web Services.
"Amazon EC2 is now in full production," Barr said in a blog post Thursday. And as promised, EC2 now offers Windows in a beta test, joining Sun Microsystems' OpenSolaris and Solaris Express Community Edition.
Along with those moves, EC2 now comes with a service level agreement, a formal commitment that the service will be available at least 99.95 percent of the time. This type of agreement makes it easier for businesses to place faith in the service. Previously, only the only AWS component with a service level agreement was the Simple Storage Service (S3), which provides online data storage.
Customers pay for AWS according to how much they need: more servers, more storage space, and more network capacity means more charges. But unlike with computing infrastructure built in-house, when customers don't need it anymore, they can stop paying for it. AWS has had outages, but it continues to gain in popularity, and Amazon has been lowering some AWS prices.
Amazon collects multiple gigabits of monitoring data each second for its Elastic Compute Cloud servce.
(Credit: Amazon.com)Barr also described features that signal growing sophistication for AWS overall in 2009 that should make it easier to administer AWS--either manually or by letting it run itself better. Barr listed four areas:
Management Console: The management console will simplify the process of configuring and operating your applications in the AWS cloud. You'll be able to get a global picture of your cloud computing environment using a point-and-click web interface.
Load Balancing: The load-balancing service will allow you to balance incoming requests and traffic across multiple EC2 instances.
Automatic Scaling: The auto-scaling service will allow you to grow and shrink your usage of EC2 capacity on demand based on application requirements.
Cloud Monitoring: The cloud-monitoring service will provide real time, multidimensional monitoring of host resources across any number of EC2 instances, with the ability to aggregate operational metrics across instances, Availability Zones, and time slots.
In a separate blog post, Amazon Chief Technology Officer Werner Vogel described some of Amazon's work in ensuring reliability and efficiency.
"We relentlessly measure every possible resource usage parameter, every application counter, and every customer's experience. Many gigabits per second of monitoring data flows continuously through the Amazon networks to make sure that our customers are getting serviced at the levels they can expect and at an efficiency level the business desires," Vogel said.
Among the customers using the Windows version of EC2 are Autodesk, RenderRocket, and Eli Lilly, Amazon said.
"This is a huge step forward in maximizing our results relative to IT spend, and now that Amazon EC2 runs Windows and SQL Server, we have even greater flexibility in the kinds of applications we can build in the AWS cloud," Dave Powers, an Eli Lilly associate information consultant who uses the service to process research data, gushed in a statement.
Autodesk uses EC2 for back-end data processing tasks, said Mike Haley, a senior architect of search engineering, and RenderRocket uses the service for 3D film and TV graphics work for TV and movies, Amazon said.
Looking to take on more demanding customers, Amazon Web Services on Thursday rolled out two paid-support plans that give customers access to its engineers to resolve glitches.
The company said it will offer two levels of support--gold and silver--for a fixed annual fee or a percentage of customers' total usage of its services. The support plans are available for its Elastic Compute Cloud (EC2), Simple Storage Service (S3), and Amazon Simple Queue Service (Amazon SQS). For more details on the terms, click here.
Right now, Amazon offers pay-as-you-go pricing for its hosted services. Customers pay for how much they use the service. To get support for technical problems, they need to go to free forums.
The paid support is a sign that Amazon's hosted computing is ramping up to take on a broader swath of clients, including large businesses.
Initially, Amazon aimed the hosted service at Web start-ups, but it's signing on business customers too. BusinessWeek reported earlier this week that The New York Times and Nasdaq are now customers.
The support service also casts Amazon more in the mold of traditional IT providers such as IBM, Hewlett-Packard, and Sun Microsystems, which all offer a variation on hosted computing.
"Guaranteed support will also allow us to develop even more substantial applications using Amazon Web Services, knowing that Amazon is there to support us," Paul Horvath, chief technology officer of health care form-processing company TC3 Health, said in a statement.
Update on Friday: Link added to Amazon Web Services support terms and costs.
As people get their heads around Google App Engine, they see some things they may not like. Namely, the dreaded "lock-in."
Developers for years have been clamoring for more openness and standards. They are tenets of the open-source movement.
But as more application development moves to hosted platforms, does data and application portability get lost in "the cloud"?
Given that we're at an early point in platform-as-a-service offerings, I'd say lock-in, to some degree, is inevitable. Most people consider Salesforce.com's Force.com closed, as it's based on the company's database and query language.
But Google? The search giant is hosting a Web development conference next month, not to sell more software stacks or subscriptions, but to encourage more apps--and people--to move to the Web, it says.
Still, O'Reilly takes Google to task for the lack of application portability--at least in this first iteration of Google App Engine.
"Now, it may be that this is a temporary oversight, and that Google does intend, long-term, to make it easy for developers to export their applications. After all, Eric Schmidt says he reminds his employees all the time, "Don't fight the Internet."
But it's also possible that this is one more sign that one of the big guys is forgetting the principles--the Internet as a platform (not "my company as a platform"), harnessing the power of user contribution (which, as John Musser pointed out, means that you always "pay the user first"), small pieces loosely joined--that brought their success in the first place.
Think his concerns are overblown? He's not the only one.
Within a few days of its release, programmer Chris Anderson wrote some open-source software, called AppDrop, that shows that you can conceivably run an instance written for Google App Engine on Amazon.com's Elastic Compute Cloud (EC2), Amazon's hosted server platform.
Developer Alex Bosworth listed lock-in as his top concern with Google App Engine.
It's likely that Google will allow applications written with other languages, like JavaScript. But the nub of online-platform lock-in comes from the data store, Bosworth said.
One thing both Amazon and Google could do to really show they are serious about their platforms is open up their data engines, which are really the core of most Web applications--open-source BigTable and SimpleDB. This would really reduce lock-in and make development easier, and it might even lead to some help improving their services.
O'Grady at RedMonk, too, argued that Google should open-source portions of its infrastructure or offer an API (application programming interface) to its data store that would ease portability to other databases.
Google appears to already be on the case of data portability. On the Google App Engine Blog, software engineer Kevin Gibbs said that one planned feature is large-scale data import and export.
"With Google App Engine, you own all the data in your app. As stated in our terms, you always have the right to get your data out of Google App Engine at any point. We wouldn't have it any other way," Gibbs wrote.
Once again, Google gets tongues a-waggin', even when it isn't the first to a party.
But it's good to see these issues raised and for developers to push for more openness. After all, standards, portability, and interoperability have been good to the Web.
Updated at 8:45 a.m. PT with information from Google App Engine blog on planned data migration tools.It's just like an unformatted hard drive, Amazon.com Chief Technology Officer Werner Vogels explained. The difference is that it's in the "cloud" somewhere and you get to it through an API.
Amazon Web Services executives on Sunday described a forthcoming persistent storage feature, called EC2 Persistent Storage, which they say will make its hosted computing services more flexible and far more reliable.
People can sign up for an early beta test program now before Amazon opens it up for a wider release later this year.
The service works with Amazon's Elastic Compute Cloud (EC2) hosted server offering. It allows developers to set aside a storage volume online, on which people save files in different file systems. This differs from what is available now with EC2 because once a compute instance is taken offline, the data associated with it goes away.
With a persistent storage service, data can remain linked to a specific computing instance. Significantly, people can take a snapshot of that data and store it on Amazon's S3 storage service. That effectively acts as a way to create a back-up of their computing operation on the "cloud," according to Amazon executives.
"The snapshot is extremely powerful technology and allows for building highly fault-tolerant applications operating worldwide. Combine these snapshots with Availability Zones and Elastic IPs and you have all the tools to manage and migrate even the most complex of applications," Vogels wrote on his blog.
"And the great thing is it that it is all done with using standard technologies such that you can use this with any kind of application, middleware or any infrastructure software, whether it is legacy or brand new," he added.
Amazon Web Services evangelist Jeff Barr also describes the service on his blog, saying it was one of the most requested features from developers.
Thorsten vok Eiken at RightScale, who has been testing the service, talks about the implications of this feature and says his company is making tools to make it easier to use these services.
Von Eiken says that persistent storage is a dramatically important feature that will lead many more companies and developers to hosted development platforms.
"It's going to be like agile software development: if you want to survive as an Internet/Web service you will have to compute in the cloud or your competitors will leave you in the dust by being able to deploy faster, better, and cheaper," he said.
On February 15 this year, Amazon S3, the "cloud" storage service that's part of the Amazon Web Services suite of infrastructure applications, failed. Web 2.0 entrepreneurs who had been attracted to AWS based on its promised reliability and low cost had their confidence shaken. Several lost revenue when the service seized up.
Last week at the Under the Radar conference, Amazon CTO Werner Vogels sat down to an interview with Robert Scoble. The discussion of course came around to the S3 outage, and Vogels explained what happened. It was, he says, a "provisioning" and "logical" problem. Translated: They didn't program S3 to handle the load they got. It has since been fixed. Amazon also recently upgraded its hosted computing service, EC2.
But while Vogels expressed unhappiness at the outage, he also believes that Amazon's cloud services are still more reliable than any collection of servers a budding Web start-up could marshal. While that may be true, that's not what companies who signed up for AWS signed up to hear. We think a simple mea culpa would have gone over better.
Amazon Web Services on Thursday is scheduled to release features meant to give its hosted computing service a better safety net.
Amazon's Elastic Compute Cloud (EC2) service now has an application programming interface (API) that lets developers choose where its application physically runs.
"Up until now, if you boot up more than one EC2 instance, you had no control where it resided--it could hypothetically be sitting on the same machine because there is no notion of location or proximity," said Adam Selipsky, vice president of product management and developer relations at Amazon Web Services.
"Now we're exposing that as a feature and you can choose to instantiate your 'nth' server in a different availability zone," he said.
Amazon Web Services last month suffered a multi-hour outage to its Simple Storage Service (S3), which affected several Web 2.0 sites.
Selipsky said the new feature will let developers add redundancy in the "vast majority" of cases.
Amazon currently gives developers the option of deploying their S3 data either in Europe or the United States.
Selipsky said Amazon will add more "granularity" on the choice of location for data over time.
Also on Thursday, Amazon Web Services introduced a IP service, called Elastic IP, that lets developers have an IP address associated to an account, rather than a physical machine.
The change makes EC2 better suited for Web application hosting, Selipsky said.





