I've been an IT industry analyst for almost 10 years. I've seen many technologies come, go, or fail to even arrive in the first place. However, during that time, a few techs have emerged that play a big part in fundamentally defining how businesses do computing. Most first emerged prior to 2000, but it has been during the past decade that they've truly changed things.
1. x86 processors were already well entrenched in corporate computing by the end of the 1990s, especially in their role as the "(In)tel" part of "Wintel" servers running Windows NT. However, their dominant designer and manufacturer, Intel, was heading in a different direction to handle the inevitable transition to the 64-bit processors and operating systems needed to keep pace with growing memory requirements.
That new direction was Itanium, a clean sheet processor design by Intel and Hewlett-Packard intended to get away from all the legacy features of x86 and--not incidentally--cut the x86-compatible processor makers out of the picture. The Itanium family remains with us but primarily as a processor for high-end HP servers. It was AMD that first added 64-bit extensions to x86 but Intel felt compelled to follow. And it was this backwardly compatible version of x86 that is the mainstream 64-bit server processor, not Itanium.
2. The other big processor story of the decade is multicore. Near the end of 2000, Intel introduced the Pentium 4 processor based on the NetBurst microarchitecture. It was intended to eventually hit about 10GHz. In fact, it never got beyond 4GHz and came to be viewed as the last gasp of performance scaling through frequency.
AMD introduced its first multicore x86 Opteron processors for servers in 2005 which helped it gain market share for a time while Intel made major changes to its development plans and processes. IBM and Sun also aggressively pursued multi-core in their RISC lines. Specialty processors such as Azul's Vega and Tilera's TILE lines went even more radically multicore. In short, frequency is largely dead as a path to higher system performance, which will require a combination of more cores and specialty accelerators working in parallel.
3. When I first met Diane Greene, co-founder and then-CEO of VMware in the fall of 2000, VMware was already selling a product to developers that let them run multiple operating systems on a single workstation. But Diane was in town to pitch me on something new, a pair of new server virtualization products--GSX and ESX Server--that made it possible to consolidate multiple workloads on a single physical server and to provision them more quickly.
The basic concept goes all the way back to IBM's involvement with early developments in time-shared computing in Cambridge, Mass., during the early '60s. And all the RISC/Unix vendors of the time had their own approaches to slicing and dicing servers. However, it was VMware that brought server virtualization to the masses. Its product ran on standard x86 servers and it provided a way to consolidate workloads right at a time when IT purchases were dramatically slowing and anything that could save money was in vogue.
EMC bought VMware in 2003 for $635 million, a figure which it's hard to believe today was widely viewed as an overpayment. Today, server virtualization--an area where VMware remains the 800-lb. gorilla despite Microsoft's entry--continues to fundamentally change the way IT departments think about operating their data centers. Virtualization also underpins much of cloud computing, another major developing trend.
4. Linux and other open source were a big part of the dot-com and service provider build-out of the late 1990s.
But enterprises? Not so much. This 2001 research note had to argue that Linux was, in fact, ready for serious production use. And, whether "ready for the enterprise" is a meaningful question in the abstract, the fact remains that the Linux 2.4 kernel was widely regarded as the first version deserving of that description and it wasn't released until mid-2000. IBM began its big Linux push at about the same time.
Thus, I'd argue that it's been this past decade and not the prior one that has seen Linux and open source truly become a pervasive part of computing. That's not to say that open-source has replaced all other software. But it has heavily influenced how companies do development, engage with user and developer communities, and provide access to their products--even when the software in question is proprietary.
5. My last entry has the greatest overlap with the consumer space. That's not a coincidence, given that mobile devices are a very visible example of what Citrix CEO Mark Templeton calls the "comsumerization of IT."
Mobile devices encompass at least a couple of different things. The most obvious entrant is probably the smartphone--first in the guise of the BlackBerry and more recently the iPhone. We are now at the point where you can carry a bona-fide computer in your pocket, complete with GPS and other sensors, and can run applications that you install. As my colleague Jonathan Eunice has noted, it really is a transformational experience relative to, say, my older Treo. It also represents the reality of the modern smartphone that, for many, it's increasingly about mail, texting, and social media and not, you know, phoning.
However, the smartphone doesn't deserve all the limelight. The noughts have also seen the laptop computer transform. I'm not talking about the form factor so much--although Netbooks have gotten their share of attention. Rather I'm talking about the way that we can use them.
I've had laptops since the 1990s but it wasn't until about 2001 that conferences and other venues started to put up Wi-Fi networks. They worked fitfully (some things haven't changed as much as we might like), but this was the beginning of the connected laptop rather than the merely mobile laptop.
And that's why I see the smartphone and the laptop as part of the same mega-trend. It's not about a particular form factor or usage model. It's about (almost) always being connected to applications that increasingly live largely in the network.
The nice thing about standards is that there are so many of them.
This old saw is arguably less true than in years past. Today, for a lot of reasons, there's more pressure to reach agreement on one way to do a certain thing. (Think the HD DVD vs. Blu-ray debacle for an example of what happens when vendors can't agree on a single approach.)
Standards aren't a single thing. Some have been blessed with the appropriate incantations by some official or quasi-official body. Others come from an industry consortium. And still others are "de facto" (or at least began life that way)--the result of a dominant company or just a default way of doing things.
The purist will argue that just being widely used doesn't make something a standard. I agree up to a point and only use the "standard" term in this case for things for are truly ubiquitous. Contrariwise, a rigorous formal ratification process is no guarantee of success.
But some standards do win big and become part of just how IT gets done. Here are some of them.
Like many other successful standards, Ethernet has remained a fixture in local area networks for so many years in part by adapting to many successive waves of technology. First developed in the famous Xerox PARC labs in the mid-1970s, it initially ran over coaxial cable but soon moved to twisted pair cable with the 10 Mbit/second generation. 10 Gbit/second Ethernet is now starting to roll out along with a variety of additions to the specification that make it more suitable as a high-performance unified fabric.
Ethernet's initial success resulted in no small part from coordinated standardization efforts beginning in the IEEE. This helped it beat out alternatives, most notably IBM's Token Ring. Over time, Ethernet's ubiquity and the cost benefits provided by this volume helped it largely stave off server interconnect challengers. InfiniBand has had wins in high-performance computing and certain other clustering applications, but it didn't displace Ethernet as a "server area network" as early promoters had hoped.
PCI, Peripheral Component Interconnect, had its beginnings as an Intel-developed bus for connecting internal cards within systems. The version 1.0 spec came out in 1992. Given the ubiquity of PCI these days, it's easy to forget that it only replaced a plethora of other busses both standardized and proprietary in x86 and, later, large Unix servers based on other processors over the course of nearly a decade.
Nor was the process steady. Although PCI was initially introduced in part to replace the VESA Local Bus for graphics cards--which it eventually did--PCI was itself replaced by AGP (Accelerated Graphics Port) for a time prior to the PCI Express generation.
PCI Express makes for an interesting case study in the marketing of standards. With technology bumping up against the limits of parallel I/O busses like conventional PCI, the Arapahoe Working Group--spearheaded by Intel--started pushing a new serial interconnect standard in about 2001. Arapahoe's success was by no means pre-ordained. AMD's HyperTransport was one alternative among several.
Arapahoe required hardware that was largely different from PCI but it was compatible with PCI's software model in a number of respects. And this was enough to get Arapahoe adopted by the keeper of the PCI standard, the PCI-SIG, and get the SIG's imprimatur on what would now be called PCI Express. And that helped make it the obvious heir to PCI. Names matter. (Here's a more detailed accounting of PCI Express and its history.)
It's easy to forget just how painful it could be, in the years before USB (Universal Serial Bus), to connect external peripherals to a computer system. RS-232, a long-used and successful standard in its own right, was the most common way. It was also a way that could easily devolve into examinations of cable pin-outs, interrupt channels, and memory addresses.
USB was a cooperative effort by a group of large technology vendors who founded a non-profit corporation to manage the specification. Version 1.0 was introduced in 1996. Now up to version 3.0, USB has become the standard way to connect external peripherals to PCs; it's also commonly used on servers for devices such as printers.
There's a spec for wireless USB but, like other standards intended to connect peripherals to computers wirelessly, it's never taken off. The current such "personal area network" getting the most buzz is My WiFi from Intel.
USB's primary competition has been FireWire, Apple's name for IEEE 1394. Unlike USB, it does not need a host computer and is faster than the USB 2.0 generation. However, it didn't catch on widely in the computer industry outside of Apple (which is phasing it out in favor of USB) and video equipment.
TCP/IP refers to the combination of two protocols: Transmission Control Protocol and Internet Protocol. Together, they are among the most important pieces of software underpinning the Internet which transitioned to using TCP/IP in 1983. Work on TCP began under the auspices of the Defense Advanced Research Projects Agency (DARPA) a decade earlier but, along the way, the software stack was re-architected to add IP as the early Internet grew.
Like many of the Internet's building blocks, TCP/IP was firmly entrenched before commercial interests got involved to any significant degree and, indeed, before most of the world at large had any real notion of the Internet's existence. The general public came to know the Internet through the World Wide Web, an outgrowth of Tim Berners-Lee's development of HTML at CERN, in the 1990s. Thus HTML, as well, is a key standard.
At the time that TCP/IP was gaining momentum, the International Organization for Standardization (ISO) spearheaded a large project to standardize networking. The "OSI model" remains the standard way to think about layers of the networking stack. If you talk about a switch being "Layer 4," you're using OSI terminology. But the specific protocols developed to go with the model were never widely used. (TCP/IP largely maps to the layers defined in the OSI model.)
The x86 architecture is perhaps the canonical example of a de facto standard driven primarily by a single vendor: Intel. Microsoft Windows is also in the running, but it was very arguably x86's ubiquity in a segment of the market open to relatively low-cost packaged software that made the rise of Windows possible. Over the past decade, AMD has also driven x86 innovations--most notably 64-bit extensions. However, it was Intel that had the biggest hand in shifting the industry from a structure in which each company did everything from fabricating processors to writing operating systems to developing databases to one in which different companies tend to specialize in one part of the technology ecosystem.
x86 emerged as a dominant chip architecture for a variety of reasons. IBM designed Intel's 8088 into the first important business PC. It got this win and others at a time when the world was rapidly computerizing. And Intel optimized itself to ride key technology trends while divesting itself of businesses, such as memory, as they commoditized.
Finally, here are a few others that could make a list like this one:
Wi-Fi played a big role in making personal computers more mobile--which is why Intel pushed it so hard.
VGA is the computing video standard that finally helped merge a rather splintered landscape and had a good long reign. (The latest video interconnect trend is a shift to HDMI--representing a coming together of computing and consumer electronics standards.)
SCSI was the first storage interconnect to merge in a big way a disparate set of existing connection schemes, both proprietary and more or less standardized. However, storage has remained an area where different standards are used for different purposes. That's changing to a degree with SATA, however, which we now see in both PCs and data centers.
SAN FRANCISCO--General manager of Intel Architecture Group Sean Maloney's announcement of a reference design for a "micro server" during his Tuesday afternoon keynote at the Intel Developer Forum brought me a sense of deja vu.
Intel's Sean Maloney holding microserver.
(Credit: Intel)He disclosed "a new ultra-low-voltage Intel Xeon 3000 series processor featuring a TDP (Thermal Design Power) of only 30 watts. To complement the broad range of dense and power-optimized platform offerings, Intel also demonstrated publicly for the first time a single-socket 'micro server' reference system which will help enable micro server innovation and future specification." Intel plans to ship the 30-watt dual-core chip in Q1 on 2010; a 45-watt quad-core version is set to ship immediately.
A reference system is primarily intended to demonstrate a concept. It provides a hands-on experience for partners and customers and therefore an opportunity to experiment with and fine-tune the basic approach. The microserver reference design will accommodate 16 server modules in a 5U-high (8.75-inch) chassis. The server boards are approximately 8-inches by 4.5-inches.
Jason Waxman, the general manager of high density computing at Intel, told me that they see the primary target for this class of system as "hosting companies that do a lot of white boxes." White boxes are systems that are often assembled in-house from component parts such as motherboards and cases. Waxman added that such companies nonetheless want many of the features associated with servers--such as memory with error correcting code (ECC).
In Intel's view of the world, microservers very much target service providers and companies that host busy Web sites and otherwise are associated with high-scale network computing. It sees this market as distinct from large high-performance computing (HPC) installations. Vendors such as HP tend to treat network computing and HPC as more of an overlapping customer group.
My deja vu when it comes to microservers relates to the fact that we've seen them before. They used to be called blades.
That's not to say that blade servers don't already exist today, but they've largely evolved into a much different concept from how they were initially conceived. The blades sold today by the likes of Cisco, Dell, HP, and IBM are about virtualization and integration. They pull together computing, networking, and storage and tightly integrate them both physically and through software. They are, in a sense, a form of scale-out consolidation.
Sun has largely eschewed this integration with their blade product line. However, Sun blades are heavily focused on high-performance computing--even to the point of integrating the HPC-centric InfiniBand interconnect on some of its products.
Rather, microservers hark back to the days of RLX Technologies, the company that did the most to promote blade servers during the Internet boom of circa 2000. Microservers are simply thin servers--compact, cheap, and simple. They provide cable simplification. They let hosting providers allocate low-cost physical servers to customers who don't want to share using virtualization.
Microservers bring blades back to their roots. Everything old is new again.
SAN FRANCISCO--The broad outline of Intel CEO Paul Otellini's keynote speech at the Intel Developer Forum on Tuesday was largely familiar. A single Intel Architecture (IA--which is to say x86) spanning servers in the data center to electronics embedded in a television.
This is a self-serving argument coming from Intel. After all, Intel already holds commanding share throughout much of the traditional PC and server space. Translating that success into newer and developing areas of the market where Intel has not historically played--or where, in many cases, the market has not even historically existed--would be a huge win.
But Intel argues that it's not purely a matter of its own interests. Rather, developers and, ultimately, end users benefit from an architecture spanning the small to the large because it lets them leverage common tools and other software.
In the past, one of Intel's proof points for this claim was to demonstrate issues associated with browsing Web sites on smartphones and other devices running non-IA processors. However, such an argument wouldn't be very convincing today in the light of the generally high-fidelity browsing experience offered by products like the iPhone despite the fact that they don't use IA-architecture processors.
Intel even undermines its own argument for commonality when it admits--as Otellini did in his keynote speech--that "handhelds have to rethink the user experience," a comment followed by a demo of a prototype interface running on Moblin. Moblin is an open-source project focused on building a Linux-based platform optimized for the next generation of mobile devices.
Commonality as a benefit and principle is hard to argue against in the abstract. But handhelds differ in many ways from PCs. User experience, given differences in screen size and the way users interact with devices that don't have a full-size keyboard, is one obvious area. However, optimizations around power usage, performance, and component integration are also much different.
In short, software that runs across a wide range of device form factors and types will hardly be common across that range even if the underlying processor architecture is. At the same time, many of the software technologies visible to both developers and users--including Flash, browsers, and Linux--increasingly span a range of processor architectures.
None of this should be taken to suggest that Intel's Atom--the processor family that's spearheading the company's push into Netbooks, handhelds, and consumer electronics--won't succeed. Perhaps as Otellini suggested, in five years, Intel may indeed sell more system-on-a-chip (SoC) processors based on its Atom processor than traditional microprocessors.
However, to the degree that Intel succeeds in this area of the market, it won't primarily be because Atom is x86. It will be because Atom beats out its competitors on metrics such as power efficiency, cost, size, and the ability of Intel partners to leverage it for their own custom designs.
A good software development framework on Atom matters too and building from an IA foundation will help there. But ultimately it's about the chip, not the architecture.
IBM's industry analyst meeting last week in Austin, Texas, covered the present and the future of its Power line. This is the system lineup once called the RS/6000 and pSeries into which was more recently folded the iSeries (previously AS/400, System 36, etc.) to form a new family called IBM Power Systems.
For our purposes here I am going to focus on Power in the guise of IBM's RISC-based lineup running a combination of AIX (IBM's flavor of commercial Unix) and Linux (either natively or using PowerVM Lx86 to run x86 Linux applications). IBM i, the successor to OS/400, also runs on the unified Power line.
IBM is slated to publicly unveil some of the details of Power7 , the next generation in the Power microprocessor lineup, at the Hot Chips conference at Stanford University on Tuesday. At the top-level, IBM describes Power7 as having eight cores per chip, four threads per core using simultaneous multithreading (SMT), and two to three times the per-chip performance of Power6 using the same amount of power.
Specifics of the systems based on Power7 will come later. IBM did say it'll build Power7 systems that generally add the required memory, bandwidth, networking, and so forth to keep the systems balanced.
Beyond future hardware, though, there were a few themes from this meeting worth highlighting.
IBM's Unix business is a smooth running machine at this point. We repeatedly saw a chart detailing IBM Unix Server revenue share gains at the expense of HP and Sun. The three companies held roughly equal share at the start of 2005; today IBM holds about 37 percent of the total market, about 10 percent more than those competitors.
One can always argue about which particular slicing and dicing of data most accurately represents the state of the market--of course, each vendor chooses the view that paints it in the best light--but it's hard to argue against the idea that IBM Power is on a nice upward trajectory.
Part of this is IBM's processor road map. One of the corollaries of Moore's Law is that time to market is performance. What would be a truly compelling processor in Year X is much less so in Year X+1 and even less so in Year X+2. And IBM Power development has been on a solid three-year cadence for major releases combined with interim speed bumps. Competitively, Sun's "Rock" has been reportedly canceled and Intel's next major iteration of Itanium, "Tukwila," has been much delayed.
However, it's also a case that IBM has simply made advancing Power Systems a real priority and has systematically put the resources in place to make that happen. I'm inclined to argue that this focus on scale-up, mission-critical, and integrated stacks that have helped lead to Power's success and the mainframe's renaissance have worked against IBM is the less differentiated segments of the x86 business. (IBM does bring many of the same concepts and even technologies to x86 with its higher-end eXA products.) Be that as it may, when it comes to Power Systems, IBM's understanding of the needs of large enterprise data centers for their core applications has paid off.
Some of this may seem unremarkable. It is IBM we're talking about here after all. However, the idea of a Unix server as an integrated stack or solution is actually a relatively new one in historical terms. Before x86 servers became so ubiquitous it was the Unix server that epitomized the roll-your-own style of computing. And Sun, until quite latterly, painted lack of any forced software (much less services) integration as a virtue.
Thus, while Power Systems are certainly still widely used in high-performance computing where the ethos still tends toward more home-grown customization and integration, Power Systems have generally evolved from a traditional Unix server mindset to one more familiar to mainframe buyers and operators. This isn't to say that Power Systems slavishly imitates all the approaches that its System z mainframe brethren take. After all, if a buyer wants a mainframe, IBM already makes one; no need to re-create a second version.
But concepts such as the benefits of owning many levels of the hardware and software stack are twined throughout Power Systems presentations--and this is very much a mainframe characteristic.
Perhaps this shines through most closely with virtualization, which is a central theme to Power Systems discussion.
- Power servers are virtualized. That doesn't mean that they have an embedded hypervisor that can be booted up or that they're factory-loaded with virtualization in some way. Rather, even if you choose to run just one copy of an operating system, you're running it virtualized. IBM's 6 million transactions per minute on the TPC-C benchmark was on a virtualized system. Virtualization is inherent in the processor and firmware in addition to the IBM management software that sits on top.
- Power Systems users make use of virtualization. As measured by the sales of PowerVM--IBM's suite to deploy, manage, and utilize multiple VMs on a server--about 66 percent of Power customers use or at least plan to use the features of virtualization at some level. (PowerVM comes in several editions; more money gets you more features.) Contrast this with x86 where, for all the legitimate interest in and excitement about virtualization, the penetration with new servers is still something in the 15 percent or so range.
- The largest Power servers are huge. With rare exceptions and bragging-rights benchmarks notwithstanding, big servers these days aren't about running one workload or even about running one operating instance. It's about running many, many workloads with the appropriate combination of workload management techniques to enforce separation, provide flexibility, and offer mechanisms to guard against failures of various kinds. Virtualization (in several forms) is the foundation for all of this.
It wasn't that long ago, even by the standards of the computer industry, the IBM's RISC/Unix lineup was in danger of becoming a sideline. IBM dabbled in Unix "unification" efforts and with Intel's Itanium processor, and invested heavily in work associated with scaling up Linux. These efforts often seemed aimed at marginalizing its own RISC/Unix processor and operating system development work over the long term. However, in 2001, IBM released Power4. It was a major step forward in processor performance, and had two CPUs per processor die--then an unusual feature. Around the same time, IBM also doubled down on the development of its AIX flavor of Unix.
The result of those decisions, combined with consistent execution, is a very good one for IBM.
At Tuesday's Hot Chips conference IBM is scheduled to take the wraps off Power7, its next generation of RISC microprocessor. This is a big deal for IBM because Power is the foundation for its AIX Unix operating system, which has been one of the stars of its server portfolio in recent years. Power also supports the IBM i operating system and can also run Linux either natively or in an x86 binary translation mode that IBM acquired from Transitive. (Transitive is the company that developed the "Rosetta" technology that Apple used for the PowerPC to Intel transition.)
Modern microprocessors are incredibly complex machines. And major iterations, such as Power7, incorporate a multitude of new features, approaches, and techniques that are collectively far beyond the scope of a piece such as this to describe. Therefore, rather than trying to touch on everything, I'm going to touch on just a few aspects of the new processor generation that struck me as particularly noteworthy.
Power7 bumps both the number of cores and the performance per core over its predecessor. Its eight cores each support up to four simultaneous multithreading (SMT4) threads for a total of 32 threads per chip. SMT is a technique that helps make better use of the many execution units within a core by reducing the amount of time that software spends waiting for resources to become free. (One of the big issues with modern processor design is that some parts of the system run much more slowly than others so it's easy for a given thread to effectively create a roadblock while it's waiting; SMT is one way to alleviate this problem.)
This relative focus on multi-threading is a considerable departure from IBM's current Power6 which, at its 2006 unveiling, showed a focus on processor on frequency and hefty individual cores at a time when radical multi-core designs were grabbing the limelight. Despite its core count increase, Power7 continues to pay attention to per-core performance, but it's through techniques other than frequency; IBM says that the Power7 core has higher performance at lower frequency than the Power6 core.
One of these techniques is the aforementioned SMT4, coupled to an increase in the number of execution units per core. Power7 also reaches back to earlier Power playbooks and reintroduces out-of-order (OoO) execution, which was temporarily shelved for the Power6 generation. OoO execution can be thought of as a complementary technique to SMT that lets the processor skip over instructions that aren't ready to be processed because they are waiting on data.
Striking a balance between single-core and total-chip performance is one aspect of a general "balanced design" theme. Another is bandwidth. Each chip has dual-DDR3 memory controllers for a total of 100GB/sec of sustained memory bandwidth per chip. Scalability ports built into each chip are expandable to systems with a total of 32 sockets with 360GB/sec SMP bandwidth per chip.
Another aspect of balance is the design of the cache hierarchy, the memory physically near the processor that keeps frequently and recently used data near the processing units so that they can be accessed faster. Perhaps most notable is that there's a 32MB shared Level 3 (L3) cache in the middle of each chip. In the past, IBM has often implemented L3 caches as a separate die on a multi-chip module (MCM). This provides lots of room for the cache but means that the memory is physically further away (and therefore often slower) and demands lot of pins on the processor package to communicate with it.
Power7 takes a different approach. It's the first major commercial processor to implement an on-chip L3 cache using embedded DRAM (eDRAM). Caches are more typically constructed from static RAM (SRAM), which is faster and doesn't need to be refreshed on an ongoing basis but requires six transistors per device, rather than one for DRAM.
IBM estimates that the eDRAM has a 6:1 Latency improvement for L3 accesses relative to an external L3. Relative to an internal SRAM array, eDRAM takes about one third the space and consumes about one fifth the standby power. As for performance, IBM characterizes it as "almost as fast" and says that it handles the memory refreshes required by DRAM--memory contents have to be periodically written or they will decay--during "windows of opportunity" and generally won't have much of an impact on system performance.
As is increasingly the norm with microprocessor designs, power management also plays a big role in Power7. It's also an area where microprocessor designers are still learning. Power6 placed considerable focus on a feature called power gating, that effectively turned off portions of a core when they weren't being used. Power7's top power-saving mode, sleep, is less aggressive. It drops the voltage to the minimum level required to retain state. With the 45nm process used by Power7, IBM says that this almost eliminates leakage current and provides most of the power benefits of turning off the power entirely while saving a lot of verification complexities.
Finally, as the processing power of chips and the servers built on them grow, so too does the need to provide a level of resiliency against errors both transient and permanent in the literally billions of electronic features that make up these systems. It may be a cliche to note that if you're going to put a lot of eggs in one basket, you need to protect that basket well--but it's no less true for that.
Many of the new Power7 reliability features are focused on memory. For example, there's full X8 "chip-kill" with 64 byte error correcting code (ECC). This means that a full DIMM can die and the data can be steered over a spare device. For system partitions tasked with particularly critical workloads, Power7 can also do selective memory mirroring--think RAID 1 for memory. Power7 also just generally keeps building on error-checking and failover features; this includes the new ability to dynamically fail over if the main oscillator (clock) associated with the chip fails.
At a high level, Power7 shares a number of general design philosophies and directions with microprocessors from other vendors, including those from AMD and Intel that are more associated with scale-out designs and redundancy at the software level. This, in part, reflects that all vendors are fighting the same physical laws and are largely constrained by the same fabrication technologies. We see a general shift toward multi-core, an increased focus on power efficiency, and a need to protect against the inevitable glitches that affect ever smaller transistors--it doesn't matter who the vendor is.
However, that said, Power7 is a different design center than the scale-out standards bearers. While not literally a mainframe, its focus is very much the same sort of resiliency and performance balance at high vertical scale.
We most associate hypervisors and virtualization with servers from their beginnings as tools for development and testing, through their widespread adoption as a means to reduce the number of physical servers needed, to their current stage as a foundation for dynamic IT architectures. Virtualization on the client side has been more of a niche although application virtualization continues to grow in importance and some specific uses, such as running Windows applications on Macs, have proven quite popular.
As for embedded devices, special-purpose computers, virtualization has had essentially no impact.
That's beginning to change. Wind River, one of the major players in embedded operating systems and recently purchased by Intel, today announced the availability of Wind River Hypervisor. From the press release:
Wind River Hypervisor enables virtualization for devices across a broad range of market segments, including aerospace and defense, automotive, consumer devices, industrial, and networking. Within these markets, embedded developers are adopting hypervisors to enable the replacement of multiple boards or CPUs with a single board and/or a single CPU, create innovative new devices that leverage multiple operating systems, and reduce complexity when integrating multicore processors. The benefits of using the Wind River Hypervisor include reduced hardware costs and power consumption, opportunity for innovation, and accelerated time-to-market.
This hypervisor can be employed in a number of different ways on a multicore processor. For example, it can be used to run multiple operating system (OS) and application instances on a single processor as a way of enabling single-threaded applications to access multicore performance.
However, what's probably the most interesting use case is consolidating different operating systems on a single processor--and thereby potentially reducing the otherwise need for separate chips or devices. This hypervisor targets two primary operating systems: Wind River's VxWorks and Wind River Linux. To understand why mixing these two might be interesting and useful, consider the differences between them.
VxWorks is Wind River's proprietary "hard real-time" operating system. It's widely used in places like aerospace and defense (think radar, avionics, and so forth) and the data processing of network gear.
Real-time, in general, refers to operating systems and software that have a very predictable response time to events. Predictability runs in opposition to overall throughput so commercial operating systems have historically not been real-time operating systems. Hard real-time typically refers to the need to have truly guaranteed response with the possibility of failure or damage if a response is not made in time.
Wind River Linux is the company's version of Linux optimized for embedded devices. In 2007, Wind River acquired FSMLabs, a real-time Linux vendor, to augment its Linux efforts. Wind River had earlier had a partnership with Red Hat, now discontinued.
Without going into all the gory details of schedulers and so forth, suffice it to say that the real-time characteristics of Linux have improved considerably over the years. Furthermore, various patches can further optimize Linux for real-time uses. Today, we see Linux in applications (such as this IBM/Raytheon deployment) that would have historically been well outside the realm of even customized general-purpose software.
That said, an operating system like VxWorks retains a specialized focus on real-time and comes in versions that carry stringent certifications. Thus, there are many situations where it may make sense to use VxWorks for those parts of an application that require those certifications or are otherwise better served by the specialized real-time OS. At the same time, Linux may be a better fit for other parts of the application that can leverage the Linux ecosystem for components such as user interfaces.
In addition to having different technical requirements, embedded systems make for a somewhat different virtualization use case than in enterprise IT. In embedded, operating systems remain far more specialized and bespoke and this makes the ability to mix and match them on increasingly powerful and multicore processors very useful. This is another face of increasingly pervasive virtualization.
I've been getting a fair number of questions about multi-threading the past couple of weeks. The reason is that Intel has been previewing its "Nehalem EX" Xeon processor in advance of Advanced Micro Device's six-core "Istanbul" CPU launch. Intel's Nehalem generation has simultaneous multi-threading (SMT)--which Intel calls Hyper-Threading (HT)--while Istanbul does not.
I wrote about this topic in depth a couple of years back in "Gradations of Threading," but it's worth reviewing in the context of these new server processors.
First, a little terminology.
A thread is a sequence of instructions that can execute in parallel with other threads. The details of what exactly constitutes a thread and the relationship between threads and other structures such as processes vary by operating system. However, for our purposes here, think of a thread as an independent task.
Simultaneous multi-threading.
(Credit: Illuminata)A core is, in most respects, a complete processor that includes all the hardware such as execution units, registers, and so forth required to execute a sequence of instructions. Although multiple cores on a single die or in a single package (i.e. a chip or socket) may share certain resources such as cache memories, logically each core is a full central processing unit (CPU). That multiple cores are packaged together today is essentially an implementation detail that relates to getting the best performance out of the most economically sized silicon die.
Absent multithreading, each core can execute one thread at a time, running that thread until it has completed or until the operating system scheduler swaps it out for another thread.
SMT changes that 1:1 relationship. On a processor with SMT, more than one thread can execute on a single core at the same time--in the case of HT, it's two threads per core.
SMT potentially allows a processor to be more efficiently utilized. The reason is that modern microprocessors have multiple execution units within each core. For example, they have separate logic to handle integer operations and floating-point operations. Thus, in principle, if a thread with mostly integer operations runs concurrently with a thread that mostly crunches floating-point numbers, we could keep the processor busier by running both threads at the same time than we could running them sequentially.
The other main benefit is to hide memory latency. CPUs have to operate on data and that data has to ultimately come from memory or disk. Computer designs incorporate all sorts of techniques--such as caches and prefetching--to keep data close to processors in time and space. Nonetheless, processors still spend a lot of time waiting for data to arrive from relatively pokey memory. SMT lets a CPU quickly switch away from a thread that's sitting idle waiting for associated data to arrive.
SMT is therefore essentially a technique to use a processor more efficiently. It does not itself add execution resources to a core. And, in fact, the duplicated hardware and other logic that SMT requires to function (such as registers) takes space away from implementing other features (such as larger caches) that could themselves provide alternative ways to boost chip performance.
Intel's HT implementation--a fairly "lightweight" approach relative to IBM's on its Power processor--uses on the order of 5 percent of the total chip area to deliver typical performance gains of between 10 and 20 percent. (Optimized applications can see bigger gains. On the other hand, applications that are already efficiently using the CPU's execution units--or that are bottlenecked in ways that SMT can't assist with--may see no gain at all.)
Ultimately SMT is just one performance feature among many that may or may not be a match for a given processor's design. In Intel's case, it's been in some x86 designs but not others since it debuted on the Pentium 4; Itanium uses a simpler Temporal multi-threading approach.
SMT's in the plus column of the features checklist. But what really matters is overall processor performance on relevant workloads and platform capabilities. SMT is one tool to get there.
Intel has slipped out a revised schedule for its next-generation Itanium processor, code-named Tukwila. Again. This time it's into 2010.
Intel released a statement Thursday on the schedule changes. It reads in part:
During final system-level testing, we identified an opportunity to further enhance application scalability best optimized for high-end systems. This will result in a change to the Tukwila shipping schedule to Q1 2010.
In addition to better meeting the needs of our current Itanium customers, we believe this change will allow Tukwila systems a greater opportunity to gain share versus proprietary RISC solutions including Sparc and IBM Power. Tukwila is tracking to 2x performance vs its predecessor chip. This change is about delivering even further application scalability for mission critical workloads.
That may be true. However, the fact remains that this is yet another delay to the program. This will put Tukwila's introduction more than two years after the debut of the current "Montvale" generation--which itself was a delayed and modest speedbump to "Montecito"--and one that Intel barely announced publicly.
Tukwila has had an especially bumpy history. This generation of Itanium processor began life as a chip project code-named Tanglewood and was said to be envisioned as a radical multicore design by the ex-Digital Equipment Alpha engineers who worked on it.
First, Intel changed the code-name to Tukwila after the Tanglewood Music Festival complained. This was back in 2003--to give you an idea of how long this particular project has been weaving its way through development. At that time, it was slated for something in the neighborhood of a 2007 release.
Then the chip apparently went through a variety of significant design changes. It will still be the first Itanium to sport Intel's serial processor communications link (QuickPath Interconnect--QPI) and integrated memory controllers. Those are both major enhancements, but otherwise Tukwila is a more conventional quad-core evolution of current Itanium designs. It will also be manufactured with a 65-nanometer process instead of the denser 45-nanometer process already used by the newest Intel Xeon CPUs. Along the way, the chip's schedule has been publicly pushed back a number of times, now to early 2010.
As a practical matter, delays to Itanium matter less to Intel and the server makers that use it (meaning Hewlett-Packard first and foremost) than in the case of x86 Xeon, where a delay of a few months can have a major revenue impact--vis-a-vis Advanced Micro Device's Barcelona.
Buyers of high-end servers like HP's Superdome and NonStop value vendor relationships, reliability, and a wide range of enterprise-class capabilities far more than they do the last drop of performance. HP has done a good job of things like leveraging its c-Class BladeSystem infrastructure for its Itanium-based Integrity servers and putting together systematic go-to-market programs with partners such as SAP.
Nonetheless, at some point, ongoing delays have to hurt competitiveness--especially given how IBM's Power systems have been hitting on all cylinders the past few years.
There's been a lot of speculation that Oracle purchased Sun for its software assets like Java, Solaris, and--although this point has seen more debate--MySQL. Even those of us who viewed the acquisition as a serious play by Oracle to become a full-fledged system vendor figured those systems would be mostly x86. That's not to say Oracle would kill SPARC processor development and servers outright--the installed base is too large and profitable--but it would be a business to milk, not to invest in.
However, Oracle CEO Larry Ellison, writing in an e-mail interview with Reuters, claims to have big plans for Sun's server business--including its in-house processor design capabilities.
Ellison begins by stating that "we are definitely not going to exit the hardware business." It doesn't get much more definitive than that as to Oracle's overall strategy of being a systems company.
What Ellison has in mind here is integration. He goes on to write that:
While most hardware businesses are low-margin, companies like Apple and Cisco enjoy very high-margins because they do a good job of designing their hardware and software to work together. If a company designs both hardware and software, it can build much better systems than if they only design the software. That's why Apple's iPhone is so much better than Microsoft phones.
Those are fair points. And Oracle has itself experimented with hardware/software integration such as the Exadata Storage Server that uses HP hardware.
At the same time, the idea that you can be in the server business and only sell into the profitable niches strikes me as a notion that Oracle may not want to depend upon too much. (Cisco has made similar statements with respect to its Unified Computing System.) The history of the system vendor business going back at least a decade suggests that the most successful companies have supply chains and partner networks that allow them to sell pallets of small servers in addition to a smaller number of highly profitable large ones.
Ellison then goes on to make it equally clear that he's not interested in just bundling software and hardware but deeply optimizing the hardware when he writes: "Once we own Sun we're going to increase the investment in SPARC. We think designing our own chips is very, very important... Right now, SPARC chips do some things better than Intel chips and vice-versa."
By way of background, Sun's CMT SPARC chips are designed around a philosophy of handling many tasks in parallel even if it means that individual tasks may run somewhat slower than on a chip with fewer but more powerful cores. This approach lends itself well to workloads that involve a lot of relatively independent activities--such as Web and application servers. It also lends itself to very power-efficient designs.
But Ellison isn't just arguing that SPARC is good for some things and x86 is good for others. He's arguing for hardware that is truly optimized for Oracle software.
Some system features work much better if they are implemented in silicon rather than software. Once we own Sun, we'll be able to plan and synchronize new features from silicon to software, just like IBM and the other big system suppliers. We want to work with Fujitsu to design advanced features into the SPARC microprocessor aimed at improving Oracle database performance.
There remain plenty of questions about how large Oracle's investments will be and how much it will tilt toward its own processor-server-operating system-middleware-applications stack. It will, of course, continue to sell software to run on HP, IBM, Dell, or wherever else it can garner license revenue from.
However, on the face of it Oracle has grand visions for its Sun acquisition that go well beyond selectively mining some key software assets and milking the rest. Oracle's purchase of Sun was the latest example of the general shift back to a more vertically-integrated computer industry going on. This latest interview with Ellison makes that point again--with exclamation points.






