• On MovieTome: The 10 worst movies of 2009 so far!

The Pervasive Data Center

November 19, 2009 9:23 AM PST

The new optimizations for capability computing

by Gordon Haff
  • Post a comment

This is the time of year to take stock in where high-performance computing (HPC) sits and where it is headed. That's because the SC09 conference is taking place in Portland, Ore., this week and it's the biggest HPC conference around.

SC is an odd duck as conferences go. Last year it had more than 10,000 attendees and, yet, it's a largely volunteer-organized event in a world where trade shows of this scope are packaged by conference specialists or some specific corporation. Think the much-renamed LinuxWorld  (run by IDG) or VMworld (run by VMware).

"SC" comes from supercomputing. Today's large computer complexes are typically not supercomputers in the sense of a specialized architecture only suitable for a specific type of technical computing. Rather, as Ashlee Vance notes in The New York Times, "The supercomputing world was long dominated by systems that required specialized chips, memory systems and networking technology. But about 10 years ago, researchers realized they could link thousands of cheaper machines running on mainstream chips and achieve pretty solid performance."

Thus an HPC event is no longer about supercomputers per se (although the term is still used as a convenient moniker for a collection of resources managed as a single entity in a single location). Rather it's about the computing components, the interconnects, the storage, and the software that ties everything together and the applications that run on top.

The Top500 nicely illustrates the evolution of HPC over time. This list, released twice annually, ranks the largest publicly acknowledged supercomputers--as the term is used today--on the basis of a somewhat simplistic, but objective, benchmark. The Top500 entries are certainly not typical of mainstream HPC; they're the biggest of the big. But they nonetheless provide some quantitative insight into important trends.

The newest iteration of the list was released Friday. There were no striking departures from the trends of the last few years, but there was some continued evolution that's worth taking note of.

The continued rise of InfiniBand. InfiniBand is a system interconnect that offers a higher performance alterative to the ubiquitous Ethernet. Although its initial backers envisioned a broader role for the technology, it's settled nicely into HPC and, to a lesser degree, back-end commercial data center functions like database clusters where low latency and high bandwidth are also paramount. (The Sun/Oracle Exadata appliance uses InfiniBand for example.)

(Credit: TOP500.org)

InfiniBand's initial growth in HPC wasn't so much about displacing Ethernet as it was displacing the fractured collection of high-performance interconnects that preceded it. Myricom's Myrinet and Quadrics' QsNet were the most common of these, but there were many. This year InfiniBand is deployed on 181 of the Top500, a 28 percent increase from a year ago.

That's a striking increase clearly. But what is perhaps more striking is that about half that increase came at the expense of Ethernet rather than mopping up a variety of older or proprietary connection technologies. This shift started between 2007 and 2008 but was even more pronounced this year.

It's certainly possible that the next 10GbE generation of Ethernet, which today is essentially absent from the list, could again push Ethernet's numbers higher. However, whatever the specific technology, the message that I take away is that large computer clusters are starting to favor more optimized interconnects even if they and the components they connect are largely off-the-shelf.

And we see an analogous trend with the proliferation of blade servers as well. Blades, a more modular and pluggable approach to system design, have proven popular in many enterprises and midmarket companies, in part, because they help bring together computing, storage, and networking technologies into a single integrated whole. That type of integration isn't of much interest in HPC. Rather, blades play to HPC by offering high densities and reducing cable count and complexity.

In fact, among x86 servers at any rate, dominance is not too strong a word to describe the presence blades in the Top500. Consider just one vendor, Hewlett-Packard. HP has 208 ProLiant systems on the list. A full 203 of these, almost 98 percent, are ProLiant c-Class blades.

Collectively, these trends suggest what might be thought of as a trend toward building optimization around standardization. In the main, especially as one moves down from the very top of the list, the Top500 is composed mostly of systems using mainstream technologies such as x86, Linux, and standard interconnects. Clusters are the dominant architecture.

But we're increasingly not seeing mere rackmount servers connected by Gigabit Ethernet. As the systems on the Top500 list grow in capability, we're seeing more focus on how they're packaged, powered, and connected.

November 17, 2009 4:30 PM PST

Observations from an EMC analyst day

by Gordon Haff
  • 1 comment

On the one hand, vendor analyst events are a good opportunity to spend focused time diving deep into individual products, roadmaps, and corporate initiatives. On the other, they're a useful forum for getting the feel of a company's overall zeitgeist in a way that narrower discussions don't. EMC's event, held last week in Franklin, Mass., was no exception.

(Credit: EMC)

Perhaps the single thing that struck me most about the event as a whole was the full integration of VMware into the discussion as a whole. I've been following both companies since before EMC acquired VMware in 2003. In the years since, although there were the obligatory nods to joint development work and "better together," VMware aggressively maintained a distance that was hardly limited to the 3,000 miles between VMware's Palo Alto, Calif., headquarters and EMC in Massachusetts. VMware's presence at EMC analyst events was largely relegated to a few off-hand mentions and perhaps a desultory breakout session given by a junior marketing person.

This year couldn't have been more different. VMware was very much woven into just about every discussion and one of VMware's senior technologists shared a panel with representatives from EMC and Cisco Systems. One thing that has changed, of course, is the ouster of VMware founder and CEO Diane Greene in 2008. It was Greene who most vocally kept EMC at arm's length. It's also the case that virtualization is increasingly at the center of everything that EMC does, so how could VMware not be an integral part?

This pervasive virtualization theme carried through to EMC VP Jon Peirce's discussion about EMC's internal IT infrastructure as well. EMC IT is using VMware to virtualize as much as possible. This includes doing database testing on a Cisco Unified Computing System (UCS) in advance of a planned migration off Sun E25000 UltraSPARC-based servers.

An initial Virtual Desktop Infrastructure (VDI) deployment also uses UCS in the form of a vBlock--a preconfigured package that combines products from Cisco, EMC, and VMware. EMC has about 200 users on VDI today and expects to roll out to several thousand next year starting in their Franklin facility. VDI and associated forms of desktop virtualization are a favorite technology of CEO Joe Tucci, who would like to move toward a platform-agnostic client strategy.

The ultimate goal is what sometimes goes by BYOPC (Bring Your Own PC), in which employees provide their own notebook computers, perhaps purchased with the help of a stipend. Even today, many of the EMC execs at the event were sporting Macs, even though IT doesn't officially support them.

Another hot topic at the event was multi-tier storage, in this case automatic storage tiering that intelligently moves data between Flash-based storage and conventional disk drives. EMC's technology here is called FAST and will roll out on Symmetrix V-Max arrays.

Flash drives can be much faster than SATA disks--or even high-performance Fibre Channel drives--but they're also much more expensive on a per-GB basis. The idea behind FAST is to automate the placement of data based on the way its accessed. For example, a database index that is frequently read and written to will migrate to high performance flash while older data that hasn't been touched for a while will move to slower, cheaper disks.

Disks being used to store rarely accessed archival data can even be deduped, compressed, and even spun down to reduce overall data center power consumption. Tape isn't part of this vision; Tucci opined that "Backup to and recovery from tape is dead."

The idea of storage tiering isn't new. Hierarchical storage management (HSM) has been around for well over a decade. However, in practice, it's mostly ended up being about moving old files to tape for archive purposes. (EMC itself has a product in this vein: Legato DiskXtended.) FAST is something more transparent and more dynamic.

There are analogs between FAST and the storage pooling that is part of Sun Microsystems' ZFS filesystem. EMC argues that the function belongs on the storage device rather than the server because the array is where data access from multiple systems and applications come together.

It's unsurprising that EMC wants storage to be at the center of things. This is a company, after all, whose tagline is "where information lives." It is, however, worth remembering that this is a different lens through which to view the world than system vendors tend to choose--and, for that matter, than VMware chose historically.

November 9, 2009 9:15 AM PST

VMware elevates its desktop virtualization view

by Gordon Haff
  • 5 comments

Although VMware got its start with a desktop virtualization product aimed at developers, the company today is best known for bringing server virtualization to the mainstream.

Creating multiple virtual servers on a single physical system lets IT departments consolidate applications onto fewer computers and thereby cut costs. Over time, server virtualization has also enabled a variety of products and approaches that can simplify IT operations and generally make data centers more flexible.

VMware has continued to invest in virtualization aimed at the client. This includes client-side hypervisors such as its original VMware Workstation product. However, products and technologies associated with delivering applications and user desktops to the client are really the main focus.

Application and desktop delivery sometimes makes use of client hypervisors but it's a largely separate category of technology that's fundamentally about centrally managing user applications and/or operating-system images. In VMware's case, virtualized desktops fall under the VMware View name.

On Monday, VMware announced VMware View 4, the latest version of its virtual desktop portfolio.

Much of VMware's development focus with View 4 was in the area of the user experience--that is, making applications and desktops delivered from a central location perform with the same responsiveness and fidelity as if they were installed on a local PC, in the usual way.

Historically, this user experience has been one of the stumbling blocks for desktop virtualization in general. Older forms of Citrix Presentation Server (now rebadged and modernized under the XenApp label) and initial virtual desktop infrastructure (VDI) implementations very much tried to simplify management and otherwise deliver direct benefits for IT operations. Whether users liked using the products was secondary.

As a result, desktop virtualization has been mostly something used by what are often called "task workers." Think call centers and other groups of users with specific jobs to do and not much say about the tools they use to do it. In general, desktop virtualization promoters have focused too much on delivering benefits to IT and not enough on delivering benefits to users. (They've also arguably paid too little attention to keeping up-front costs down and relied too much on promises of soft cost savings down the road.)

One of the technology pieces that VMware is leaning on to improve user experience is the PC over Internet Protocol (PCoIP). PCoIP was originally developed by Teradici to improve the responsiveness and display quality of virtual desktops. However, in Teradici's initial implementation, specialized hardware was needed on both ends of the wire. This effectively made it a premium solution for situations in which cost wasn't a factor, such as for financial traders and government agencies for which security considerations are paramount.

VMware has worked with Teradici to create a software-only version of the protocol. Desktop virtualization Chief Technology Officer Scott Davis goes into a lot of the details on his blog.

It's a User Datagram Protocol-based server-side protocol that transmits compressed bitmaps or frames to the remote client. This has the advantage of being able to make real-time adjustments to account for the available bandwidth and latency of the communications channel; the display quality degrades, if there isn't enough bandwidth but things still "work."

Although details differ, there are similarities to Sun's Appliance Link Protocol--which is well-regarded for its ability to deal with poor-quality connections. (A downside of server-side protocols is that they consume processing horsepower on the server, where it tends to be more expensive, rather than on the client.)

VMware will continue to support other remote display protocols, most notably Microsoft's Remote Desktop Protocol. However, VMware is clearly positioning PCoIP as its favored technology and a point of competitive differentiation for VMware View in general.

Also in the graphics area, View 4 adds "multimonitor, adaptive display support--resolution optimization for each monitor, with an option to pivot and rotate the display output, supporting rich audio and video content with increased performance."

Other user experience enhancements generally relate to better integration with the overall desktop environment. For example, View Printing automatically discovers local printers without the need to install print drivers. View Limited Access provides a single point of authentication across VMware View environments, Windows Terminal Servers, Blade PCs, and remote physical PCs.

VMware View 4 comes in two editions. The Enterprise Edition includes the basics: VSphere 4 (the back-end server virtualization product), VCenter 4 (management), and View Manager 4 (for provisioning user access). It's priced at $150 per concurrent connection.

The $250-per-concurrent-user Premier Edition adds ThinApp 4 (for delivering ad hoc applications that aren't part of a master image) and View Composer (for managing images), both capabilities that would typically be desired in a large or sophisticated deployment.

VMware as a whole approaches the world from the perspective of the enterprise data center. Delivering desktops from that data center was somewhat of a sideshow. Is it now as focused on application delivery as, say, Citrix? Not really. But that said, desktop virtualization has moved beyond the sideshow stage at VMware.

November 6, 2009 6:00 AM PST

Intel's James Reinders on parallelism - Part 2

by Gordon Haff
  • Post a comment

Intel's James Reinders is an expert on parallelism; his most recent book covered the C++ extensions for parallelism provided by Intel Threaded Building Blocks. He's also the Director of Marketing and Business for the company's Software Development Products. In Part 1 of our discussion at the Intel Developers Forum in September we talked about how to think about performance in a parallel programming environment, why such environments give developers headaches, and what can be done about it.

Here, in Part 2, we move on to cloud computing, functional and dynamic languages, and what needs to happen with computer science education.

Few wide-ranging conversations these days would be complete without at least a nod to cloud computing which Reinders views as very much connected to the matter of parallel programming.

Cloud computing is parallel programming. You're solving the same problem. In fact, someone that's good at decomposing a program to run in parallel on a multicore or on a supercomputer... the same thought process is necessary to decompose a problem in cloud computing. What's different in cloud computing is that the cost of a connection or a communication between two different clouds is so high. You really need to get it right. It works best when a little message is sent, does an enormous amount of computing, and gets a little message back.

Data parallelism tends to be very fine-grained.

Task parallelism like we see with Cilk and Threaded Building Blocks is a little bit more coarse.

Cloud computing has to be very very coarse-grained parallelism.

But there's something common about how you have to think about it.

The tools that will let people do cloud computing, express a problem in cloud computing, may eventually just map onto a multicore.

The granularity that Reinders discusses refers to how small a chunk of computing can be, given the cost and latency of communications. Within a single processor, communications bandwidth is high and latencies low, so software can afford to perform a relatively small task and then synchronize the results. (Although moving large amounts of data can still be relatively "expensive" which is why data parallelism can be finer-grained than task parallelism; see Part 1 for further background on data parallelism.)

By contrast, external communication networks have limited bandwidth and are relatively slow--on the order of four or five orders of magnitude slower than communications within a system. Therefore, tasks have to be parceled out in relatively large chunks that, ideally, don't have to be packaged up with a significant amount of local data.

Next up was education. Here, Reinders' basic message was focusing on the theory before diving into the implementation details. I suspect that this highlights one of the key challenges: Parallel programming tends to require a solid grasp of programming theory and doesn't lend itself particularly well to just "hacking around" in the absence of that grounding.

I've been doing a lot in the area of teaching parallelism. What a lot of people think of right away is teach them locks, teach them mutexes [algorithms to prevent the simultaneous use of a common resource], teach about how to create a thread, destroy a thread. That's all wrong. You want to be talking at a higher level. How do you decompose an algorithm? What is synchronization in general? Why does it exist?

Things I would hope undergraduates would learn are parsing theory, DAG representations [a tool used to represent common subexpressions in an optimizing compiler], database schemas, data structures, algorithms. All these are high level, not things like [the programming language] Java. Parallel programming's like that too. You get hands-on touching the synchronization method or whatever but you want to teach the higher level key concepts.

Some people it's going to be more in-tune with their thinking but you try and teach it to everyone.

Given that most of today's languages weren't expressly designed for parallel programming, discussions about parallelism often turn to new programming languages. This means functional languages most of all but can also involve dynamic or scripting languages which generally handle more low-level details under the covers than do Java or C++.

Functional languages don't lend themselves to easy, or easily comprehensible, description. A common shorthand is that "Functional programming is a style of programming that emphasizes the evaluation of expressions, rather than execution of commands." But that probably doesn't help much if you don't already know what it is. As for Wikipedia's entry, Tim Bray--no programming slouch--called it fairly impenetrable. (Perhaps you begin to see the problem.)

A couple of things I'm interested in functionals for. We don't wake up one day and everyone uses. It's sequential semantics again and sequential semantics appeal to people and functional languages don't have them. But some people eat them up.

And they solve amazing problems. You can code things up in them that are much easier to understand than if they are written in a traditional language although they can be cryptic or terse to a lot of programmers.

Erlang [a functional language] has gotten a bit more and more usage. Maybe it is creeping in. It's not going to take over the world overnight but it seems like the one that might stay around. May be talking about it 20 years from now and saying, yeah, Erlang's been around for 25 years. It might be accepted as a language. It may have legs.

But even Java. [Unlike Erlang,] It appealed to people who programmed in C and C++; it didn't challenge them to think differently. And because of the strict typing and stuff it helps [the enterprise developer] to deploy certain types of apps.

Python [a dynamic language] is interesting. It is so popular with a lot of scientists. It's on my short list of things, where if we can figure out where to partner or extend some of the things we're doing, Python's on my short list of languages that we want to help with parallelism. Maybe some of our Ct technology would apply there. We'll see if other people agree with us. Think the concepts we're talking about are pretty portable. 

Finally, we concluded our discussion with hardware.Are there opportunities at the hardware and firmware level with memory subsystems or with specific technologies such as transactional memory? Sun Microsystems was very interested in transactional memory in the context of its now canceled "Rock" microprocessor. The basic concept behind transactional memory is to provide an alternative to lock-based synchronization by handling concurrency problems as they occur at a low-level rather than having the programmer protect against them all the time.

The best solutions tend to not be silver bullets so much as incremental. Nehalem [Intel's latest microprocessor generation] in a way probably helped us more than  anything in recent memory because we moved to the QuikPath interconnect and moved bandwidths up and latencies down. Larrabee [a many-core Intel microprocessor still under development] may pave the way with some innovations in interconnects. I think there may be some refinements needed. Interconnecting the processors is a classic supercomputer issue.

Transactional memory has slammed up against a very tough reality which is that hardware always wants to be finite; software solutions wants to be infinite. Think there's something there.I think the people looking at transactional memory have started to make observations about locks that may end up being useful. It's funny. The mission of transactional memory is to get rid of locks but the more they looked at it the more they understood about how locks behave. There might actually be possibilities to make locks behave better in hardware.

Can we do the hardware a little differently? Not the sexiest thing in the world. But as we move from single-threaded to  multi-threaded what complications are we creating things [that the hardware can help with]?

Even if you don't subscribe to the more extreme views of programming and software being in a crisis because of the move to multi-core, we're clearly in a transition. New tools are needed and programmers will have to adapt as well, to at least some degree.

November 5, 2009 6:00 AM PST

Intel's James Reinders on parallelism: Part 1

by Gordon Haff
  • 1 comment

Multicore processors are here to stay and the number of  cores that we'll see packed onto a single chip is only going to increase. That's because Moore's Law is only indirectly about performance; it's directly about increasing the number of transistors. And, for a variety of reasons, turning those transistors into performance today largely depends on cranking up the core count.

There's a downside to this approach though. Programs that consist of a single thread of instructions can only run on a single core. This in turn means that they're not going to get much faster no matter how many cores a chip adds. Running faster means going multi-threaded--splitting up the task and working on the different pieces in parallel. The problem is that programming multi-threaded applications introduces complications that don't exist with single-threading.

These complications and ways to overcome them was the topic of my conversation with James Reinders at the Intel Developers Forum in September. Reinders is the director of marketing and business for Intel's Software Development Products. He's an expert on parallelism and his most recent book covered the C++ extensions for parallelism provided by Intel Threaded Building Blocks.

In part 1 of this discussion we talked about how to think about performance in a parallel programming environment, why such environments give developers headaches, and what can be done about it.

Reinders began by noting that developers fall into roughly two groups when it comes to parallel programming: those who are still concerned about ultimate performance even in a parallel world and those who are just looking for a way to deal with it at all.

The challenge is understanding what we're trying to introduce, how to use parallelism, but with programmer efficiency. Because programmers don't need yet another thing to worry about. There's plenty of those out there.

And we need to be a little more relaxed about the performance. The people who start asking me about efficiency in every last cycle used and such--I characterize them as people we need to talk to more about our high-performance computing-oriented tools that give you full control. And other people are "I don't even know how to approach parallelism." I think there is a different set of ways to talk about the problem.

The problems with this second group comes down to the fact that most programmers are used to dealing with something called "sequential semantics." A detailed description of programming semantics is a complex computer science topic but, at a high level, sequential semantics means more or less what it sounds like it sounds; instructions follow one after another and execute in the order that they are written.

If you store the number "1" in variable A, then store the number "2" in variable B, and then add them together in a third instruction, you can be confident that the answer will be "3." It won't depend on timing vagaries that might have caused the addition to happen before the stores. Most people start out programming sequentially using languages designed for that purpose.

Parallel programming, on the other hand, introduces concepts like data races (the answer is dependent on the timing of other events) and deadlocks (in which two threads are each waiting for the other to complete so that neither ever does). Here's Reinders:

If you've ever managed and got a bunch of people working on a project together, one of the headaches you get is coordinating with each other. What did Fred say to Sally? They're doing things out of order or whatever. Parallel programming can give you that same sort of headache.

The programming terminology you'll hear the compiler people use is "sequential semantics." One of the interesting areas is what can we do if we ensure sequential semantics. We recently acquired a team in Massachusetts who were working for a company called Cilk Arts.

Our hope is that Cilk can do a subset of what Threaded Building Blocks [TBB] can but preserve sequential semantics. We think we can do sequential semantics, do a subset of what TBB does, since we're introducing keywords into the compiler--that has some disadvantages because it's not as portable--but we think we might be able to magically give you sequential semantics and not give up performance. That's a big if.

Now why would we invest in that?

Because there are a lot programmers who have been getting along just fine with sequential programming. But when you tell them to add this or that for parallelism, a big thing that trips them up is that you no longer obey sequential semantics; you have more than one thing running around and you get data races, deadlocks, and it doesn't feel comfortable.

Now some people will argue that you need to do these things to get good performance. We have the feeling that in some cases you don't need to take that big of a leap to get pretty good performance.

And no one's going to criticize your app on a quad core for being only 70 percent efficient.

From there we moved on to data parallelism which focuses on distributing data across processing elements. It contrasts with the task parallelism that we commonly associate with the term parallel programming. Pervasive DataRush is one commercial product based on a data parallelism model. APL, the language with the strange symbols (for those with long memories), is often considered the first data parallel language. There have been a variety of others, often extensions to more conventional languages like C and FORTRAN, but none were widely used.

The other thing we're looking at is data parallelism. And that's where we acquired the RapidMind team and combined them with our Ct [C for Throughput Computing] team.

Data parallelism just takes it one step further. Data parallelism is all about the parallelism in the data. So you're talking about the data when you program.

And once you start talking about the data, the tools underneath can move the data around. Leaving the data management up to the programmer [as with Cilk and TBB] turns out to be a terrific headache. This applies equally to a cluster where they don't share memory or a GPU and a CPU in the same system.

But a language like RapidMind or Ct can address that problem. And CUDA and OpenCL can too [frameworks primarily oriented towards heterogeneous processing that uses graphics cores for computing tasks] but RapidMind and Ct are at a much higher level of abstraction which means that we're betting on the idea that we can attract more developers and give up some efficiency.

Part 2 of our conversation will cover cloud computing, functional and dynamic languages, and what needs to happen with respect to programmer education.

November 3, 2009 10:01 AM PST

Red Hat debuts virtualization management

by Gordon Haff
  • 2 comments

Correction at 7:15 a.m. PST November 4: At one time, Red Hat had planned to ship an embedded KVM hypervisor based on Fedora. But the Red Hat Enterprise Virtualization Hypervisor uses the RHEL 5.4 kernel and thereby picks up the same hardware verification portfolio.

With Tuesday's release of Red Hat Enterprise Virtualization Hypervisor and Red Hat Enterprise Virtualization Manager (RHEV-M) for servers, the company has completed the first phase of a server virtualization rollout that effectively now puts KVM front and center. Red Hat released KVM commercially for the first time in September as part of Red Hat Enterprise Linux 5.4.

KVM is a server virtualization technology that Red Hat acquired when it bought Qumranet in 2008. Red Hat favors KVM over the other primary open-source hypervisor, Xen, for both business and technical reasons. (Although, as of version 5.4, Xen remains the default hypervisor for RHEL.)

The business reason is that, while Red Hat has made contributions to Xen, competitors are far more associated with the project. Novell, the owners of the only other major enterprise Linux distribution, ran especially hard with Xen early on. And Citrix, not a direct competitor but certainly a major virtualization player, bought XenSource, the commercial entity formed by Xen's creators.

From a technical perspective, Red Hat's issue is that it's hard to keep Xen and the Linux kernel in sync. Xen's a standalone hypervisor layer but it has deeply invasive hooks into the Linux kernel and, therefore, keeping the two working together takes a lot of development and testing effort. It's a bit reminiscent of how new versions of the Veritas file system had to be carefully matched to new versions of Solaris or HP-UX.

By contrast, KVM is kernel-based. This means that it is actually part and parcel of the Linux kernel rather than a quasi-independent piece of software. In part for this reason, it's KVM that is now included in the mainline Linux kernel as of version 2.6.20.

As of version 5.4, an instance of RHEL can host guest virtual machines running RHEL 5 and other operating systems including Windows Server 2008. This announcement adds Red Hat Enterprise Virtualization Hypervisor, something that is often referred to as an "embedded hypervisor." It uses the same RHEL 5.4 kernel as Red Hat's full enterprise distribution.

Embedded hypervisors have taken off more slowly than many of us expected. But all the major virtualization players offer one so Red Hat needed to as well.

From my perspective, the Red Hat Virtualization Manager is more significant. On the one hand, management is important to--indeed central to--virtualization. On the other hand, it's an area where Red Hat has lagged. CTO Brian Stevens admitted as much to me when we spoke at the company's financial analyst day last month when he said that RHEV-M "has been a huge missing ingredient."

Red Hat historically mostly focused on updating packages. This is a reflection of the broader Linux and open-source ecosystem in general. Projects like Nagios and, more recently, GroundWork notwithstanding, management doesn't play well to the strengths of open source because it's such a "high surface area" application. But Red Hat had to attack management from some angle unless it was prepared to just cede that area of differentiation and potential point of control to system makers and others.

RHEV-M is Red Hat's first step toward remedying this deficiency. It seems a necessary move especially given that KVM is likely to be used, at least initially, as part of a Red Hat software stack and therefore Red Hat pretty much has to support the tools to manage KVM if it's to gain any market traction.

That said, this is very much a first step. The initial product only manages KVM. Furthermore, the management server has to be running Windows Server 2003 which you would rightly think a rather odd decision from a company that is one of the pioneers of open source. (Apparently, this was a decision by Qumranet and Red Hat has not yet developed a version that can run on Linux.)

Red Hat has clearly prioritized getting a usable if limited product into customers' hands. They trotted out one such at their financial analyst day. Dave Costakos of Qualcomm was happy with what he saw. He told me that they wanted a Web-based interface, which RHEV Manager has. He also liked the integration with Active Directory and other directory systems, the role-based access controls, and the provisioning capabilities.

Overall, Red Hat's virtualization play remains less filled in than do the plays of others. But it's now started in a systematic way.

November 3, 2009 7:19 AM PST

3Leaf's modern take on NUMA

by Gordon Haff
  • 1 comment

Over the years, we've seen a variety of approaches intended to meld multiple small servers into a single larger system. 3Leaf Systems is the latest. On November 3, it introduced a Dynamic Data Center Server (DDC-Server) for AMD Opteron processors. The DDC-Server combines a custom DDC-ASIC chip with software to create a symmetric multiprocessing server with 32 6-core AMD "Istanbul" processors and 1 terabyte of memory.

The system, together with the InfiniBand switch required to interconnect the  server components, 8TB of storage, and 3Leaf's software, is priced at $250,000. A smaller $99,000 version is also available. However, these systems should be thought of primarily as proof of concepts intended to create proof points with customers and to provide system makers with a tangible product. 3Leaf's go-to-market plan is to sign up system original equipment manufacturers and sell them ASIC (application-specific integrated circuit) chips and software--not to itself be a seller of systems.

The basic concept behind 3Leaf's design has quite a few antecedents.

In the 1990s, Data General and Sequent came up with large Unix server designs that connected "standard high volume" (SHV) x86 modules with cables using a protocol from Dolphin Technology called SCI. The component modules were never as standard or high volume as the SHV term implied but the approach still reduced development costs and increased the flexibility of the system relative to the more monolithic designs that characterized most large SMP servers of the day.

More recently, Virtual Iron developed a distributed hypervisor that could not only subdivide a single server in the vein of server virtualization products like VMware's ESX Server, but could also meld multiple smaller systems into large ones on the fly. (Virtual Iron later abandoned its proprietary hypervisor in favor of Xen and was later absorbed by Oracle.)

ScaleMP's vSMP Foundation is the current product for aggregating x86 servers that is probably most comparable to 3Leaf's. To date, it's been primarily focused on high-performance computing. The key distinction is that, unlike ScaleMP, 3Leaf uses a custom ASIC in addition to software. Both companies are primarily focused on InfiniBand as their interconnect although there is nothing architectural to prohibit the use of 10-Gigabit Ethernet over time. From a technical perspective, 3Leaf is essentially layering its own coherency protocol on top of InfiniBand. The current product uses the same socket as the AMD processor. However, 3Leaf also has a license for Intel's QuickPath Interconnect.

3Leaf says that, by developing an ASIC that gets into coherent memory transactions at the cache level, they are able to get better performance across a wider range of workloads than a purely software-based approach can.

Performance has been a stumbling block with this approach historically.

An SMP server, however constructed, is characterized by the fact that it is a shared memory architecture. This means that any processor can directly access any memory in the system. In general, this makes for a simpler programming model than distributed memory architectures, such as clusters, in which a lot of the work associated with making sure you're working with latest data is shifted from hardware to software.

How quickly a given processor can get to the memory that it needs plays a big part in a system's performance. In fact, for some workloads such as database transaction processing, memory access times can be the single factor that most affects how fast a system is. As a result, traditional large server designs incorporated expensive hardware such as crossbars to keep memory traffics flowing across the entire system quickly.

Today's small servers have equally speedy and high-bandwidth memory links--indeed their compact footprint can help to reduce latency even further. However, once you combine multiple nodes, the time it takes for a processor to access memory on another node can rise dramatically. The exact numbers depend on many factors, including what else is going on in the system at the time. But, as a rule of thumb, it takes at least twice as long to access memory on another node than if it were local--and could take several multiples of that. In other words, memory access is non-uniform; NUMA is the term often used.

Over time, operating systems have gotten much better at keeping processing and associated memory physically close to each other. Certain workloads are also less sensitive to NUMA designs than others. Many HPC, analytics, and business intelligence applications involve fewer of the sort of memory updates that tend to drag down performance in NUMA architectures than does typical enterprise online transaction processing.

It's also the case that, today, large SMP is as much about having a large and flexible pool of hardware resources for server virtualization as it is about having a single large SMP image. Thus, in many respects, large SMP is increasingly about management rather than monolithic application performance. Which is one of the reasons that we're seeing a general trend towards modularity in all SMP designs.

Thus, the SHV approach to SMP system design arguably sits closer to the mainstream than it has in the past.

As 3Leaf's Shahin Khan told me, the key factors with this approach are it "had better be low cost and work." Performance has to be acceptable over at least an interesting subset of workloads and there can't be a significant price premium over the constituent systems and hardware. And ultimately, for 3Leaf, success will result from convincing one or more major system OEMs that the time has arrived to add a system or systems based on this approach.

October 28, 2009 10:25 AM PDT

Cloud computing's dual identity

by Gordon Haff
  • 2 comments

Last week's virtual version of the roaming CloudCamp conference was a good opportunity to check the pulse of cloud computing's evolution.

It struck me that we continue to see two very different groups of attendees at events such as this.

One is the "clouderati," the vendors involved with cloud computing in some form or another, and sophisticated users who grok cloud computing and its implications for their organization. This crowd is so past definitional debates and analogies to the electrical grid. They just want to get on with specific issues--such as dealing with audit requirements in a world where you increasingly can't just walk into a datacenter and point to the physical server where an application is running.

The other group is still a bit fuzzy on the general concept. Does cloud computing just mean Amazon Web Services? Where does software as a service fit? Is it just a load of hype? Is it safe?

It's not hard to understand why there is a fair bit of confusion. Cloud computing has become a sort of blanket term for where computing is going. Think of it as a synonym for "computing.next." It represents a shift to an operational model in which applications don't live out their lives on a specific piece of hardware and in which resources are more flexibly deployed than was the historical norm.

Cloud computing is therefore not a single technology or even a single approach but rather a collection of technologies and approaches that collectively represent the direction that computing is headed. I see nothing wrong with this. Many of the benefits espoused for these new approaches to computing are genuine. To the degree that "cloud computing" offers a convenient rallying point to get users headed there, that seems for the good.

But specific things are easier to grapple with than general paradigms. And cloud computing started life as something fairly narrow as articulated in author Nick Carr's The Big Switch. (The irony is that, not only is the cloud computing concept bigger than the Big Switch concept of a few, huge mega-service providers, it now seems unlikely that the degree of centralization and fundamental change in economic model envisioned by Carr will happen any time soon.) Whereas today, we see cloud computing used sometimes to mean computing.next and sometimes to mean a specific technology approach that someone is either promoting or denigrating.

Cloud computing incorporates and makes use of many individual sharply-defined techs of course. But, increasingly, think of the broad term as applying to a way of thinking about computing rather than the specifics of how it's done.

October 23, 2009 9:31 AM PDT

Technology takes time

by Gordon Haff
  • 5 comments

There are many different technology adoption models out there. Geoffrey Moore's curve--the one that uses terms such as "Early Adopters" and "Late Majority"--is a common one. And different technologies end up getting adopted at strikingly different rates. This fascinating chart from The New York Times shows how the telephone made its way into U.S. homes only over a span of many decades while the VCR went from rare to commonplace over about a single 10-year stretch.

In general, new technologies are permeating the market faster than ever before. Still, the length of time it takes for even an ultimately successful innovation to become commercially important is routinely underestimated by lots of industry watchers. I've been guilty of this myself.

One issue is that many of us in the IT ecosystem are early adopters by nature. We're enthusiastic about the new coolness for its own sake, not just for what it's capable of. By contrast, the ultimate buyers are often more conservative and mostly want technologies that have already proven themselves. It's a potential that we as analysts try to guard against, in part by speaking with different types of end users.

Another issue is that new technologies are often more interesting in combination with other pieces than they are in isolation. To use the old cliche, the whole is greater than the sum of the parts. However, the corollary is that it takes more work and more time to bring that combination into being than it does just one component. Frederick Brooks discussed this reality in the context of bringing the IBM System 360 to market in his widely read "The Mythical Man-Month".

I bring up this topic because of something that caught my eye in a Web 2.0 Summit presentation by Mary Meeker of Morgan Stanley. She devoted a large chunk of her presentation to mobile trends, beginning with a slide that stated "Mobile = Incremental Driver of Internet User / Usage Growth." She went on to say that "Mobile Internet usage is and will be bigger than most think."

This computing growth includes Apple. She stated that "Near term, Apple is driving the platform change to mobile computing. Its mobile ecosystem (iPhone + iTouch + iTunes + accessories +  services market share / impact should surprise on the upside for at least the next 1-2 years." However it also includes a rich set of other devices including automobile electronics and home entertainment devices. In some respects, this is the "Internet of things" as Sun Microsystems CTO Greg Papadopoulos has called it. (Although as Richard MacManus over at ReadWriteWeb suggests, the full Internet of things, including RFID sensors and the like, is something more expansive.)

The "secret sauce" in this growth? Location-based services. Meeker quoted Mathew Honan, of Wired magazine, who wrote: "Simply put, location changes everything. This one input - our coordinates - has the potential to change all the outputs. Where we shop, who we talk to, what we read, what we search for, where we go - they all change once we merge location and the Web."

What caught my eye about all this was that I remember all the enthusiasm over the imminent arrival of the mobile Web back during the first Internet build-out about a decade ago. Here's a typical press release from a company named Optus in November 2000: "Mobile phone users can locate a close-by restaurant, chemist, bank or cinema now that Cable & Wireless Optus has launched Australia's first range of sophisticated location-based services on its Wireless Application Protocol (WAP) service, Optus Networker."

There were many such claims at the time and many proclamations that "place" was the Next Big Thing.

Ultimately it appears the proclaimers were right. But it took a while. It arguably took the second or third iteration of iPhone for applications that make use of the user's location in smartphones to take off in a big way. And thereby make the promises of press releases of the year 2000 a mainstream reality.

Some of it is just technological maturity of the device and the network. A mobile browser that can access the "real" Web with reasonable fidelity and performance rather than being restricted to a dumbed-down mobile Web turned out to be one major piece.

Key too was a development environment that made it possible for many casual developers to create applications and not just a few working closely with a handset maker.

The vast amounts of data created over a number of years through various types of social media is pretty important as well. We don't mostly find nearby restaurants through formally curated data; we find it through Yelp.

In short, the rich mobile experience isn't about one thing but many. And aligning the pieces always takes time.

October 22, 2009 6:28 AM PDT

I/O virtualization's competing forms

by Gordon Haff
  • 3 comments

Server virtualization means something fairly specific. Storage virtualization is a bit more diffuse. But it's I/O virtualization that really covers a lot of ground.

At a high level, virtualization means turning physical resources into logical ones. It's a layer of abstraction. In this sense, it's something that the IT industry has been doing for essentially forever. For example, when you write a file to disk, you're taking advantage of many software and hardware abstractions such as the operating system's file system and logical block addressing in the disk controller. Collectively, each of these virtualization levels simplify how what's above interacts with what's below.

I/O virtualization brings these principles to the edge of the network. Its general goal is to eliminate the inflexible physical association between specific network interface controllers (NICs) and host bus adapters (HBAs) and specific servers. As a practical matter in a modern data center, this usually comes down to virtualizing Gigabit Ethernet (and 10 GbE to come) and Fibre Channel links.

Virtualizing these resources brings some nice benefits. Physical resources can be carved up and allocated to servers based on what they need to run a particular workload. This becomes especially important when the servers themselves are virtualized. I/O virtualization can also decouple network and storage administration from server administration--tasks that are often performed by different people. For example, IP addresses and World Wide Names (a unique identifier for storage targets) can be pre-allocated to a pool of servers.

That's I/O virtualization conceptually. Vendors are approaching from a lot of different directions.

For starters, like many things, I/O virtualization has its roots in the mainframe. From virtual networking within servers to channelized I/O without, many aspects of I/O virtualization first appeared in what is now IBM's System z from whence it made its way into other forms of "Big Iron" from IBM and others. Thus, many servers today have various forms of virtual networking within the box whereby virtual machines communicate with each other using internal high-performance connections that appear as network links to software.

However, I/O virtualization in the distributed systems sense first arrived in blade server designs. Egenera was the pioneer here. HP's Virtual Connect for its c-Class BladeSystem and IBM Open Fabric for its BladeCenter are more recent and more widely sold examples. And virtualization, including I/O virtualization, lies at the heart of Cisco's Unified Computing System (UCS).

Blade architectures incorporate third-party switches and other products to various degrees. However, they're largely an integrated technology stack from a single vendor. Indeed, this integration has arguably come to be seen as one of the virtues of blades. In this sense, they can be thought of as a distributed system analog to large-scale SMP.

A new crop of products in a similar vein aren't tied to a single vendor's servers.

Aprius, Virtensys, and NextIO are each taking slightly different angles, but all are essentially bringing PCI Express out of the server to an external chassis where the NICs and HBAs then reside. These cards can then be sliced up in software and divvied up among the connected servers. Xsigo is another company taking a comparable approach but using InfiniBand-based technology rather than PCIe.

Whatever the technology specifics, the basic idea is to create a virtualized pool of I/O resources that can be allocated (and moved around) based on what an individual server requires to run a given workload most efficiently.

There's a final interesting twist to I/O virtualization. And that's access to storage over a network connection. While network-attached file servers are suitable for many tasks, heavy-duty production applications often need the typically higher performance provided by so-called block-mode access. For more than a decade, this has tended to translate into storage subsystems consisting of disk arrays connected to servers by a dedicated Fibre Channel-based storage area network (SAN).

However, with the advent of 10 GbE networks and associated enhancements to Ethernet protocols, we're starting to see interest in the idea of a "unified fabric"--a single infrastructure to handle both networking and storage traffic. One of the key technology components here is a protocol called Fibre Channel over Ethernet (FCoE) that allows block-mode storage access originally intended for Fibre Channel networks to traverse 10 GbE instead.

There's more to unified fabrics than that involving alternate protocols such as iSCSI and various acceleration technologies but for our purposes here, I'll use FCoE as a blanket term.

So what does FCoE have to do with I/O virtualization? After all, an adapter card optimized for FCoE can be virtualized alongside other NICs and HBAs. So, at first glance, you might think that FCoE and I/O virtualization were simply complementary.

At one level, you'd be right. Aprius, for example, advertises that it provides "virtualized and shared access to data and storage network resources (Ethernet, CEE, iSCSI, FCoE, network accelerators) across an entire rack of servers, utilizing the ubiquitous PCI Express (PCIe) bus found in every server."

However, considered more broadly, I/O virtualization and FCoE solve many of the same problems--that of connecting servers to different types of networks without a lot of cards and cables associated with each individual server.

Adapters that connect to converged networks will themselves converge to card designs that can handle a wide range of both networking and storage traffic. Furthermore, if Ethernet's history is any indication, prices are likely to drop significantly over time; this would make finely allocating networking resources among servers less critical.

To the degree that each server can get a relatively inexpensive adapter that can handle multiple tasks, the rationale of bringing PCIe out to an external I/O pool is, at the least, much reduced. There are still rationales for virtualizing I/O in some form--especially in an integrated environment such as blades. Cisco, for example, puts both FCoE and virtualization front-and-center with its Unified Computing System. But narrow justifications for I/O virtualization such as reducing the number of I/O cards required are significantly weakened by FCoE.

At the end, FCoE may not be I/O virtualization as such but it's closely related in function if not in form.

advertisement

S.F. hacker space: Heaven for the DIY set?

The Noisebridge hacker space offers sewing and Mandarin classes, soldering workshops, Internet-controlled front door access, and a server room with no door.
• Photos: Circuits, code, community

The browser battles go on and on

roundup From Firefox to IE and from Chrome to Opera and Safari, there's no sitting still for browser makers looking to keep their products fresh and competitive.

advertisement

About The Pervasive Data Center

This blog takes a deep (and often skeptical) look at trends big and small in the world of enterprise servers, data centers, and "Yotta-scale" computing. This means also taking into account the myriad of software, networks, and devices that are driving change in (or being driven by) these back-end systems. Stories posted to this blog may also appear on Illuminata's site.

Gordon Haff is a principal IT adviser for Illuminata of Nashua, N.H. Before becoming an IT industry analyst, Gordon held a variety of product-marketing positions at Data General, spanning more than a decade. He's programmed for DOS, Windows, and Linux; builds his own PCs; and holds engineering degrees from MIT and Dartmouth, with an MBA from Cornell. He is a member of the CNET Blog Network and is not an employee of CNET. Disclosure.

Add this feed to your online news reader

The Pervasive Data Center topics

Most Discussed



advertisement

Inside CNET News

Scroll Left Scroll Right