As reported on CNET, Amazon Web Services has announced a new pricing option that lets its customers take advantage of spare capacity within the EC2 infrastructure at variable, supply-and-demand-driven pricing.
The news has taken the cloud community by storm. For some, it represents the beginning of a long-anticipated move to market pricing for core IT infrastructure services.
(Credit:
Wikimedia Commons)
While there is some truth to the importance of AWS spot pricing to the history of cloud computing, let's keep things in perspective: this pricing is set by Amazon, not any market. We are a long way from a true commodity market for any form of cloud computing service.
Before I go any further, let's review how the feature works:
Each customer sets a maximum price he or she is willing to pay for "spot instances."
Amazon sets a "spot price" for instances hour-by-hour, based on available supply and demand.
Customers pay whatever the spot price is up to their maximum price. So, if someone bids $0.07/hour, and the spot price is $0.05/hour, the person pays $0.05/hour.
If the spot price exceeds the customer's maximum price, the customer's instances are terminated.
Spot pricing is the third EC2 pricing option, joining existing on-demand and reserved instance options. The first two options targeted two critical-use cases for cloud computing: reserved instances for mission-critical apps where capacity must always be available to meet demand, and on-demand pricing for just about everything else.
However, the success of both options likely left Amazon with a big problem: excess capacity. The success of reserved instances means that Amazon has to keep around enough capacity to guarantee that it can handle any spike in demand that might come along. The success of on-demand pricing means that Amazon has to build out new capacity fast enough to stay ahead of the voracious demand curve.
So, what to do? Enter spot pricing. Amazon's new pricing is an incredibly creative way to encourage consumption of unused data center capacity, by providing that capacity at clearance sale prices on the condition that Amazon can take it back at a moment's notice. For the right kind of applications, it's a true win-win situation.
Why not profit from what would otherwise be a liability?
Note, however, that this feature is not market-based pricing. Amazon determines the spot price and can raise that price enough to gain back capacity at will, at no real cost to itself. There is no competition. There is no commoditization. There is just consumption of what is not being used.
The truth is, real commoditization of infrastructure services--or any other cloud service, for that matter--isn't in the best interest of Amazon or any other service provider.
Regardless, commoditization can't happen without open standards that allow easy portability and interoperability of data and code, as well as security, control, service-level assurance and compliance systems. Those standards are coming, but it is impossible to predict when they will arrive. I only hope Amazon embraces them when they do.
In the meantime, we can watch with admiration how the success of Amazon Web Services allows it to explore the future of IT with the enthusiastic help of a customer base that truly benefits from each success. I can't wait to see how customers choose to take advantage of spot pricing.
One of the most interesting aspects of the weeks leading up to and including this year's VMWorld was the incredible innovation in cloud-computing service offerings for enterprises--especially in the category of infrastructure as a service. A variety of service providers are stepping up their cloud offerings, and giving unprecedented capabilities to their customer's system administrators.
In this category, enterprises are most concerned about security, control, service levels, and compliance; what I call the "trust" issues. Most of the new services attempt to address some or all of these issues head on. Given that this is the infancy of enterprise cloud computing, I think these services bode well for what is coming in the next year or two.
Here is a brief analysis of the offerings that recently caught my eye:
Amazon Web Services Virtual Private Cloud: There is no doubt that the smart people at Amazon continue to innovate at a breathtaking pace. The last three years have seen a whirlwind of new and upgraded services, ranging from storage and server capacity, to payment processing and content delivery.
Amazon's new Virtual Private Cloud offering is just another example of how they listen to their customers when they build solutions. Not so much unique and innovative, as a near perfect execution of a simple solution to a raft of thorny problems, Amazon's VPC service is essentially a powerful VPN gateway which allows Amazon services to be added to the customer's network.
Now, this doesn't directly address security, compliance, or service levels, but it gives enterprise customers a level of control over network configuration that was previously unavailable from Amazon, which in turn enables the customer greater latitude to address those issues.
Savvis "Project Spirit": Available in beta "by the end of this year," Savvis's Project Spirit adheres to a "Virtual Private Data Center (VPDC)" concept very similar to the Virtual Data Center vision espoused by Sun. In a video providing an overview of the service, Savvis indicates that Project Spirit provides three tiers of service, each with an increasing set of capabilities and improved quality of service (QoS).
The video demonstrates wizard-based provisioning and drag-and-drop resource topology design, both of which are similar to features from GoGrid and Sun, though perhaps a little more aligned with the latter than the former.
What I like about Project Spirit is its sense of configurability; something that I think has been missing from many IaaS offerings to date.
Terremark vCloud Express: Terremark is one of the first out of the gate with a basic "one server at a time" offering based on VMWare's vCloud Express infrastructure. Targeted at the same users who find Amazon's EC2 so easy to use, the service is meant as a simple, low-risk way for customers to acquire compute capacity.
In a video recorded at VMWorld, Simon West, Terremark's VP of marketing, demonstrates provisioning a server in the service. Like other services in its class, it focuses on allowing you to select a server image from a menu of possibilities, click a button, and boot the resulting server in a few minutes. Pricing starts at $.036/hr for a 1 "VPU," 0.5GB server, but as Chris Flex of Citrix Systems notes in a blog post, Terremark charges differently than Amazon, so the CPU cost does not necessarily reflect cheaper overall operation costs.
Terremark's new service complements its existing Enterprise Cloud service, which is targeted at larger, more sophisticated infrastructure needs.
OpSource Cloud: Hosting vendor, OpSource, is taking a more network-centric approach toward cloud definition, similar to the "subnets" that Amazon allows customers to create in its VPC offering. The OpSource cloud is in pre-beta now, with an October target for "public release." When the OpSource team demonstrated their user interface to me, they showed me a metaphor that begins with the definition of a "network," which is an isolated through custom routing capabilities at the OpSource data centers.
Each network comes with eight public IP addresses (more can be added), and you can add resources such as servers, storage, and firewalls as you see fit. You can also create as many networks as you'd like for each account.
Obviously, there are many more offerings like these in the market today. However, it is interesting to note that the common theme here seems to be security, either through "isolation" via networking, and/or through the availability of enterprise-class firewalls, load balancers, and the like. The expansion of virtual data center offerings is also interesting, as I think it shows the early growth of what will likely be the true enterprise cloud-computing space.
Access control and user account management was a little sketchy in most of the services I saw, although some showed real promise.
However, one has to wonder as application architectures adjust to cloud computing, how much longer they are going to be tightly coupled to data center architectures. At what point will it no longer be advantageous for application owners to define infrastructure in terms of servers, storage, and security devices?
That being said, the independence of distributed applications from underlying architecture is a long way off, even from the enterprise perspective. I expect that by this time next year, we will see a stable of very strong enterprise public cloud offerings, with support for various compliance standards, sophisticated networking, and cloud-centric security services and technologies.
This is just the beginning of a long evolution, folks.
On the third anniversary of its Elastic Compute Cloud launch, Amazon Web Services late Tuesday announced a new service, the Virtual Private Cloud.
Targeted at customers with existing IT investments, the Virtual Private Cloud (VPC) service provides a way for companies to create a logically separated set of Elastic Compute Cloud (EC2) instances and a secure VPN connection to their own networks.
Amazon Web Services illustrates how the Virtual Private Cloud functions.
(Credit: Amazon.com)Jeff Barr, Amazon Web Services strategist, said in a blog that the service requires three elements: a VPC instance, an IPSec VPN gateway, and a block of IP addresses provided by the customer. The VPC's address space can range from 16 addresses (known to network administrators as a /28 address range) to 16,384 addresses (a /18 address range), and the addresses can be divided up into subnets to further partition traffic.
All Internet-bound traffic is routed through the customer's network and outbound security systems before reaching the public network, Barr said.
Amazon.com Chief Technology Officer Werner Vogels described in a blog Amazon's vision for the service:
(CIOs) have bought into the cloud as a target for a significant portion of their services, as the benefits are too obvious to ignore, and most expect that their transition will be a continuous process. They would accelerate the adoption of cloud services if they could access a form of cloud that would give them the best of both worlds: the flexibility and cost-effectiveness of accessing a virtually infinite pool of resources without owning it, while being able to integrate those resources into their existing datacenter environments such that they could continue to leverage existing investments in their management and control infrastructure...
We have developed Amazon Virtual Private Cloud (Amazon VPC) to allow our customers to seamlessly extend their IT infrastructure into the cloud while maintaining the levels of isolation required for their enterprise management tools to do their work.
Not all Amazon Web Services capabilities are supported in Amazon VPC at the start, such as Amazon EC2 security groups, DevPay AMIs, and Internet-facing IP addresses. The VPN service has been tested with equipment from Cisco Systems and Juniper Networks.
VPC pricing is based on a $0.05 hourly charge for VPN access, plus a cost for data transfer into and out of the connection, ranging from $0.10/GB to $0.17/GB. Charges for other Amazon Web Services, including Amazon EC2, are billed separately at Amazon's standard rates.
The debate about the validity of internal cloud implementations has raged on for some time now, with some claiming that cloud computing and wholly owned infrastructure don't mix, and others pointing out that applying "on demand," "at scale," and "multitennant" to enterprise IT data centers offers unique advantages to those who have already made that investment. It has been difficult, however, to do an objective comparison of the two approaches--until now.
The announcement on Thursday of Amazon's new Hadoop-based Elastic MapReduce service, combined with the introduction of a commercial Hadoop distribution from start-up Cloudera, means that we finally have a reasonable means of watching which directions enterprise IT prefers. Let me explain.
Amazon's service is a simplified, prepackaged Hadoop implementation that can be leveraged by anyone with an Amazon account. The Amazon Web Services blog describes it as follows:
Today we are rolling out Amazon Elastic MapReduce. Using Elastic MapReduce, you can create, run, monitor, and control Hadoop jobs with point-and-click ease.
You don't have to go out and buys scads of hardware. You don't have to rack it, network it, or administer it. You don't have to worry about running out of resources or sharing them with other members of your organization. You don't have to monitor it, tune it, or spend time upgrading the system or application software on it.
You can run world-scale jobs anytime you would like, while remaining focused on your results. Note that I said jobs (plural), not job. Subject to the number of EC2 (Elastic Compute Cloud) instances you are allowed to run, you can start up any number of MapReduce jobs in parallel. You can always request an additional allocation of EC2 instances here.
Processing in Elastic MapReduce is centered around the concept of a Job Flow. Each Job Flow can contain one or more steps. Each step inhales a bunch of data from Amazon S3, distributes it to a specified number of EC2 instances running Hadoop (spinning up the instances if necessary), does all of the work, and then writes the results back to S3.
Each step must reference application-specific "mapper" and/or "reducer" code (Java JARs or scripting code for use via the Streaming model). We've also included the Aggregate Package with built-in support for a number of common operations such as Sum, Min, Max, Histogram, and Count. You can get a lot done before you even start to write code!
Cloudera, on the other hand, provides a Hadoop build that you can deploy wherever you wish:
Cloudera's Distribution for Hadoop is based on the most recent stable version of Apache Hadoop. It includes some useful patches back-ported from future releases, as well as improvements we have developed for our support customers.
Cloudera's Distribution includes everything you need to configure and deploy Hadoop using standard Linux system administration tools.
Here's what I'm thinking: enterprise IT is looking at an entirely new class of applications that take advantage of MapReduce to process very large sets of both structured and unstructured data for things like predictive analysis, sorting/sequencing, and data mining. Both commercial Hadoop offerings meet the demand for a platform to simplify the development and operation of these applications. The primary difference is the where, not so much the what.
That is exactly what will make the competition between the two offerings so compelling to watch. Let me break it down for you:
Will the requirement to own and operate hardware work against Cloudera? What makes the Amazon offering so groundbreaking (and it will prove to be historic, in my opinion) is that it is now possible for anyone with a need to analyze large data sets to do so simply for the cost of data storage plus processing time. (Note that the use of Elastic MapReduce adds a nominal cost to the server instances that host the instances.)
Where "grid computing" was once the playground of large enterprises and academic institutions that could afford the hardware to justify the cost of building them out, Amazon makes it possible for even individuals to run such jobs for a few tens or hundreds of dollars.
Cloudera, on the other hand, requires that the hardware be available to install it on. That either means existing server capacity, new hardware (which greatly adds to the cost, and can only be justified for continuous Hadoop use), or leased capacity. The latter starts to look a lot like Amazon's service.
Will Amazon's requirements to use S3 work against it? There are three reasons why I see it might:
- The commonly cited concern about data security outside of corporate firewalls. (Even if the perception is wrong, the perception exists.)
- The cost of data transfer to and from the S3 service--currently as high as 17 cents per gigabyte a month.
- The cost of storage of both the raw data and the aggregate results--currently as high as 15 cents per gigabyte a month.
It should be rightly noted that if you already rely on S3 to store your data sets to be processed, this is a great deal. However, if you have to upload terabytes or even petabytes of data to be combed through by MapReduce, then this could get quite pricey on its own, and existing infrastructure might serve the purpose well. If you are going to leave the data up there permanently--and update it regularly--the cost of Amazon's service should be weighed against the cost of owning and operating that storage yourself in your existing facilities.
Will the so-called "barrier of exit" stand up? I'm not even arguing that the choice will be based solely on the comparative costs to the business. In fact, what I am interested in is the extent to which business units and departments will simply bypass IT altogether to build and run their own jobs in Amazon Elastic MapReduce.
If IT maintains a valuable service using existing facilities and computing investments, then Cloudera will likely do fine. If not, then Amazon stands to be the overwhelmingly dominant commercial Hadoop implementation.
I should also note that running a Hadoop instance is not the same thing as cloud computing in and of itself. An internal Cloudera implementation is not necessarily an internal cloud, though if operated "on demand," "at scale," and with multitenancy, it certainly qualifies as a cloud.
I will be watching this space closely for the next year or two. I have a feeling that Amazon will do fine, regardless, as there are many possible implementations that would benefit from a completely public cloud implementation. The real test is probably how much opportunity Cloudera finds within enterprise data centers.
Cloudera also has much more competition from the free downloads of Hadoop than Amazon has, in my opinion, as it faces a more traditional open-source competitive landscape.
Is your company looking at MapReduce for a new generation of data-mining applications? If so, what will you choose: the public, external cloud implementation of Hadoop from Amazon Web Services, or the wholly owned, internal implementation of the same from Cloudera?
There were two very interesting pieces of news to come out in the last week related to the availability of relational databases in the cloud. One involved a start-up you have almost certainly never heard of, and the other involves a major player in on-premise database products.
The first was an announcement to the crowd at "Whose Cloud is It Anyway?"--a "roundtable and meet-up" sponsored by TechCrunch, held Friday on Microsoft's Mountain View, Calif., campus.
(Charles Cooper has more on the "roundtable" portion of the program. My favorite part of the afternoon was the fun comment by Salesforce.com CEO Mark Benioff; he noted the irony of hosting a cloud-computing meeting at the facilities of the vendor most disrupted by the trend.)
During the "pitch" section of the afternoon, Justin Santa Barbara of start-up FathomDB announced that the company has released to beta testing a sort of virtual managed hosting service for "standard relational databases" running on Amazon.com's Elatic Compute Cloud, or EC2, service. (There is a video of the afternoon's pitches; FathomDB starts at about 49:30.)
The start-up's current service simply allows someone to get a basic relational database management system, or RDBMS, instance (initially MySQL) up and running in minutes under its management, with services including creation, monitoring, and backup.
... Read more- prev
- 1
- next





