Speeds and Feeds

Read all 'AMD' posts in Speeds and Feeds
October 14, 2009 5:55 AM PDT

The factor factor, part 3

by Peter Glaskowsky
  • 8 comments

In part 1 and part 2 of this series, I claimed that there is apparently a secret rule in the microprocessor industry that determines the success--or failure--of new chip designs.

The failures included RISC processors, media processors, and intelligent RAM chips, which all sank in spite of clearly demonstrable advantages over alternative solutions. The great success is the programmable graphics processing unit (GPU), which has succeeded in spite of the sometimes wrenching shifts in programming methods and PC system architecture that have been required to support it.

So what's the secret? Simply this: a factor-of-two advantage, even if it's an inherent, persistent advantage, isn't enough to unseat an incumbent solution in the face of even the mildest competitive disadvantage. Without a factor of 10--a full order of magnitude--a new product won't even get a foot in the door.

That's why I call this rule the "factor factor." It isn't enough to be a few times faster than the existing alternatives. Given the performance consequences of Moore's Law, it's easier for your potential customers to wait a few years rather than spend a few years adapting to your "issues." You need be much faster than the products you're trying to replace. The target factor is 10--no less.

Sometimes, even a tenfold advantage isn't enough. One order of magnitude is enough to overcome one disadvantage, such as a change of programming methods. Add another simultaneous disadvantage, however, like the serious constraint in local memory capacity imposed by the IRAM concept, and the new technology may need a factor of 100 in performance to win a place in the market.

Overall, a new product must deliver net benefits amounting to as much as a full order of magnitude in cost, performance, or productivity to compensate for each significant disadvantage. That's just what it takes to motivate customers to deal with the problems rather than waiting for Moore's Law to speed up the solutions that are already familiar to them.

The introduction of the AMD64 instruction set by Advanced Micro Devices (also known as EM64T or "Intel 64" on Intel processors, or generically as x86-64) represents the ultimate success case for the factor factor.

Athlon 64 processor

AMD's Athlon 64 debuted the AMD64 instruction-set architecture.

(Credit: Advanced Micro Devices)

This isn't immediately clear, I suppose. Adopting the AMD64 standard required a lot of work by operating system vendors and software developers, and the performance benefit was relatively mild in most cases. But still, AMD64 was an immediate success because the performance benefit in certain applications--those that simply wouldn't fit into a 32-bit address space--was practically infinite.

Although the factor factor seems obvious--or at least it should--it's still at the heart of many failed products and hundreds of millions of dollars of wasted investments every year.

In Silicon Valley, like other chip-design centers around the world, projects rarely fail because of poor execution. In most projects, the engineers are good at their jobs, the managers are good at coordinating their work, and the investment is sufficient to get the work done.

Most projects fail at the conceptual level, before the detail design work even begins. The factor factor is only one of many reasons for these failures, of course, but it's the one that disturbs me the most because it's the easiest to anticipate.

This rule doesn't apply to all products. When a new chip for an existing market is architecturally compatible with previous products, a factor-of-two performance improvement is plenty. Even smaller benefits can justify the costs of developing a new product if there are few, if any, disadvantages associated with it.

Multicore CPUs are one of these products, at least for now. Process technology makes it pretty easy to double core counts. Dual-core CPUs were almost a drop-in replacement for single-core chips and caused no serious problems. Quad-core chips were the same thing again. Eight-core CPUs may be a lesson in diminishing returns, but I'm sure they'll be commercially successful.

Beyond that, we'll have to see how it goes. The critical advantage of the CPU over the GPU is high performance on inherently serial processing tasks (what we sometimes call "single-threaded applications"). On a typical PC, there's rarely more than a few of these tasks running at any given moment. It's always useful to have a few extra cores available for parallel tasks, but at some point (I'm thinking somewhere around the 16-core level), PC buyers are likely to stop paying extra for more extra cores.

Even mighty Intel could find itself on the wrong side of the factor factor. Given that quad-core chips became a mainstream product just this year, we can expect to see 16-core processors for ordinary desktop PCs in 2013 and laptops in 2015 or so. By that time, the GPU could be the incumbent solution for high-performance parallel processing, and multicore CPUs could be the technology looking for compelling performance advantages.

So...now you know the supposed secret. When you hear about a radical new microprocessor architecture, you can do what I do: imagine the numeral "1" followed by a "0" for each drawback you see in the proposal. Compare that figure with the claimed benefits and you'll know which way to bet.

By the way, kudos to CNET users divisionbyzero and TrinityTrident, who proved my point that this rule isn't really a secret by explaining it on their comments to the previous posts in this three-part series.

Now if someone could only explain why so many companies don't seem to know this rule!

October 7, 2009 8:07 AM PDT

ATI and Nvidia face off--obliquely

by Peter Glaskowsky
  • 8 comments

Nvidia and Advanced Micro Devices' ATI division are taking different approaches to graphics processing in the next generations of their products. Both strategies have strengths and weaknesses, and I think it's too soon to pick the eventual winner in this long-running fight.

Before I get into my analysis, I should say that Nvidia paid me to write a white paper on the implications of its new GPU architecture (code-named Fermi) for high-performance computing applications. The white paper was released as part of the Fermi launch event at Nvidia's GPU Technology Conference last week.

Nvidia also paid for white papers from two other well-known microprocessor analysts, Nathan Brookwood of Insight64 and my friend and former colleague Tom Halfhill of Microprocessor Report. UC Berkeley professor David Patterson wrote a fourth white paper, and Nvidia wrote one of its own. All of these works take a different approach to the subject; all are worth reading if you need to understand what Fermi is all about.

In short, I think the Fermi architecture has been more thoroughly white-papered than any graphics chip design in history. All five of these documents are available on the Fermi home page on Nvidia's Web site, and just in case that page is moved or changed, you're welcome to take advantage of my own mirror of my white paper.

I've spent much of the last several days reading these documents plus David Kanter's excellent article on Fermi over on his Real World Technologies site. David managed to get some details on Fermi that Nvidia didn't give to the rest of us.

I've also had time to go through the coverage of ATI's recent launch of the RV870, which is what Nvidia's Fermi-based chips will be competing against. The first of Nvidia's chips bears the internal code name of GF100, and it's huge. Here's a life-size photo:

... Read more
August 31, 2009 5:35 AM PDT

High-end server chips breaking records

by Peter Glaskowsky
  • 3 comments

How would you like a single-chip microprocessor with more than four times the performance (on some applications) of Intel's best Core i7?

Then consider that up to 32 of these chips can be directly connected to form a single server, achieving four times the built-in scalability of Intel's next-generation Nehalem-EX processor.

That's IBM's widely anticipated Power7, which it described at last week's Hot Chips conference. But if you're interested, you'd better be prepared to spend a lot more than four times as much per chip. IBM isn't talking about pricing, but large Power servers can cost more than $10,000 per processor.

IBM Power7 die photo

IBM's forthcoming Power7 server processor has eight cores, manages 32 threads, and includes 32MB of on-chip embedded DRAM cache. Power7 also has the highest levels of off-chip bandwidth ever achieved by a microprocessor.

(Credit: IBM)

What makes the Power7 so powerful? Each chip has eight cores, and each core supports four-way multithreading. There's 32MB of level-3 cache on the chip, made using embedded DRAM (eDRAM) cells. Most CPUs use SRAM for cache because it's generally easier to combine with high-performance logic, but DRAMs--with only one transistor per bit--offer compelling density advantages. IBM spent years developing a new kind of eDRAM that would work with SOI (silicon on insulator) manufacturing processes, and the Power7 is the most advanced product to use the new technology.

Interestingly, the Power7 cores run much more slowly than those in the Power6 processor, which I wrote about here in 2007 ("Live from Hot Chips 19: Session 1, IBM's Power6"). The Power6 was designed to run very fast using a long CPU pipeline in order to deliver the highest possible performance on each thread of execution.

Maybe that strategy didn't work out as well as IBM hoped, because the Power7 returns to a more traditional microarchitecture with a shorter pipeline and much lower clock rates--though IBM didn't say exactly what those rates would be.

IBM did, however, promise that the Power7 would be roughly four times as fast as the Power6, chip for chip. Since it has four times as many cores, each of the new slower-clocked cores must still deliver about as much performance as those in the previous generation.

Chip-level performance must always be matched by off-chip connections lest the incoming data or outgoing results be bottlenecked by a too-slow channel. Accordingly, the Power7 is equipped with eight I/O channels for DRAM, each of which connects to an off-chip buffering device that splits the channel into two 64-bit DRAM interfaces. All together, IBM says the Power7 has 180 GBps of DRAM interconnect that can sustain over 100 GBps of effective memory bandwidth.

There's another 50 GBps of peak I/O bandwidth and a staggering 360 GBps of peak bandwidth used to let each Power7 chip communicate with others. The DRAM connected to each chip is thus shared across larger systems.

Combining these figures, IBM says a single Power7 has 590 GBps of total off-chip bandwidth. This isn't the real number, since many of those bytes are used for error-correcting codes and other overhead, but it's still pretty impressive.

So is Power7's die size: 567 square millimeters for 1.2 billion transistors. That's nearly a square inch! IBM says that if the 32MB L3 cache had been manufactured using SRAM, the transistor count would have been 2.7 billion instead.

Still, Power7 wasn't the only high-end chip talked about at Hot Chips.

Rainbow Falls, a record for core count
Sun Microsystems was there to describe its forthcoming Rainbow Falls chip, which I assume will be marketed as the UltraSparc T3. The chip has 16 cores, each of which is reportedly able to manage 8 threads.

Sun's primary Rainbow Falls presentation focused on details of Rainbow Falls' internal and external interconnects; a second talk described the cryptographic coprocessors present in each of the chip's cores. These coprocessors--one for modular arithmetic (commonly used in public-key cryptography) and a cipher/hash unit to accelerate bulk ciphers like AES and secure hash algorithms--provide many times the performance of pure software implementations.

Fujitsu was also at Hot Chips to describe its eight-core, 2GHz Sparc64 VIIIfx processor, the latest in a long series of impressive designs from the company. Fujitsu quoted a peak performance figure of 128 GFLOPS (billions of floating-point operations per second) with a typical power consumption of just 58 watts. It did not, however, provide sustained performance or worst-case power consumption figures.

AMD, Intel vie for high-volume servers
Few of us will have direct exposure to the IBM, Sun, and Fujitsu chips. A pair of presentations from Advanced Micro Devices and Intel described products that will be much more widely available.

AMD launched its six-core Opteron processor code-named "Istanbul" earlier this year (see Brooke Crothers' coverage from June). Next year the company will begin shipping a new Opteron model currently code-named Magny-Cours (after a racetrack in France). Magny-Cours will consist of two Istanbul chips in a single package, with twice as many DRAM interfaces to support the new processor's increased performance.

AMD also teased the audience with another mention of a new processor core design that has been under development there for several years: "Bulldozer," which is now targeted at 32nm process technology. This new core will incorporate new x86 instruction-set extensions which will probably not be adopted by Intel (a strategy that reminds me of AMD's old 3DNow extensions).

But saving the best for last--best, that is, from the perspective of anticipated sales--Intel's talk on Nehalem-EX showed just how far Intel has been able to push the technology envelope for high-volume servers.

Nehalem-EX is an eight-core version of the existing quad-core Nehalem design. The new chip also has 24MB of L3 cache done in old-school SRAM. By my calculations, about 60 percent of the chip's 2.3 billion transistors are in this cache alone.

Nehalem provides four links to external DRAM buffer chips supporting two DDR3 DRAM interfaces each (much like the Power7 solution) and four QuickPath Interconnect links that provide direct "glueless" connections for up to eight-processor systems (64 cores, 128 threads). Intel is also working on an external Node Controller chip for systems with up to 2,048 Nehalem-EX processors.

The aggregate bandwidth numbers for Nehalem aren't as mind-boggling as those for Power7, but they're still far beyond anything available for PC-architecture servers today. Based on the presentation, I estimate Nehalem could boast over 85 GBps of peak memory bandwidth and 100 GBps of chip-to-chip bandwidth, some of which must be allocated to I/O.

I expect the raw number-crunching performance of the Nehalem-EX cores to be roughly on the same level as Power7's cores. The lower ratio of bandwidth to processing power for Nehalem-EX reflects a different design target, not a design shortfall--and most importantly, a much lower selling price. There will presumably be versions of Nehalem-EX priced similarly to existing Xeon MP products, which currently top out at $2,301 each in small volumes, but that's a very reasonable price to pay for the market's most advanced x86 server processor.

August 28, 2009 9:50 AM PDT

OpenCL: Parallel programmers' new best friend

by Peter Glaskowsky
  • 11 comments

Apple's Snow Leopard operating system, which hits the streets on Friday, has plenty of new technology--but one of its major new features will soon be available on Microsoft Windows, Linux, and other major platforms.

OpenCL, the Open Computing Language, was originally proposed by Apple to support parallel programming on GPUs. There are other GPU programming languages, such as Nvidia's CUDA (Compute Unified Device Architecture) extensions for C and the Brook stream program language developed at Stanford University and included in Advanced Micro Devices' Stream Computing software development kit, but rather than choosing one of these languages, Apple chose to create a new standard independent of the big graphics vendors.

In fact, OpenCL is even independent of Apple. One of the first things Apple did was offer to hand it over to the Khronos Group, the same independent standards organization that manages the OpenGL standard for 3D rendering.

OpenCL working group member logos

Supporters of the OpenCL standards effort at the Khronos Group include the biggest CPU and GPU makers in the industry. Apple is also involved but not shown here.

The members of the OpenCL working group turned Apple's draft specification into the released version 1.0 spec in just six months (see Brooke Crothers' "OpenCL goes beyond Apple" from last December)--and in the process, it created what may be the best solution so far to the general problem of parallel programming.

See, OpenCL isn't just for GPUs. It was designed from the beginning to get the most out of multicore processors too. After all, if you have a multicore CPU--and you probably do--why let it go to waste? OpenCL is flexible enough to support both CPU-optimized and GPU-optimized code, and smart enough to choose the right code, depending on what hardware is available in the system to run it. Most of the competing parallel-programming languages can't do that.

OpenCL can take advantage of both task-level parallelism (running many tasks at once, whether different tasks or copies of the same task) and data-level parallelism (where a single instruction within a task is applied to multiple data items at once--also known as SIMD). Some parallel-programming languages can't do that, either.

But OpenCL's biggest advantage isn't technical in nature: it's that no other parallel-programming language will be so widely supported. The support starts with Snow Leopard but will go well beyond that. AMD and Nvidia will have OpenCL drivers for their GPUs under Windows and Linux. AMD and Intel will support OpenCL on their CPUs (including Intel's Larrabee). And AMD has already shipped its first OpenCL implementation for its Athlon and Opteron processors.

Implementations for video game consoles and DSPs (digital signal processors) are also under development. I've even heard that future releases of OpenCL may be able to work with less common hardware, such as FPGAs (field-programmable gate arrays).

We had an excellent half-day OpenCL tutorial last weekend at Hot Chips 21. There were also some great OpenCL presentations at Siggraph 2009 earlier this month; if you'd like more detailed information, that's a good place to start.

All this support for OpenCL means that it should become the first choice of academic and commercial developers who want a good cross-platform way to develop parallel code. Expect to see OpenCL used in software for audio and video processing, cryptography, medical imaging, and many other applications--including, of course, gaming.

(Disclosure: I will be writing a technical white paper for Nvidia, one of the companies covered in this story.)

June 17, 2009 5:01 AM PDT

GPUs and the new 'digital divide'

by Peter Glaskowsky
  • 5 comments

I spent Tuesday at Nvidia headquarters, attending the company's annual Analyst Day.

I've been to most of Nvidia's analyst events over the last decade or so, since I covered Nvidia almost from its inception while working as the graphics analyst at Microprocessor Report. These meetings are always a good way to get an update on the company's business operations, and sometimes--like this time--one provides exceptionally good insight into larger industry trends.

Nvidia's GeForce GTX 280 graphics chip

Nvidia's GeForce GTX 280 graphics chip

(Credit: Nvidia)

Nvidia has had a rough couple of quarters in the market, which CEO Jen-Hsun Huang blamed in part on a bad strategic call in early 2008: to place orders for large quantities of new chips to be delivered later in the year. When the recession hit, these orders turned into about six months of inventory, much of which simply couldn't be sold at the usual markup.

In response, Nvidia CFO David White outlined measures the company plans to take to increase revenue, sell a more valuable mix of products, reduce the cost of goods sold, and cut back on Nvidia's operating expenses.

Three things stood out for me in this presentation:

Nvidia is planning an aggressive transition to state-of-the-art ASIC fabrication technology at TSMC, the company's manufacturing partner. Within "two to three quarters," White said, about two-thirds of the chips Nvidia sells will be made using 40-nanometer process technology. (The first of these chips were announced Tuesday.)

White also acknowledged something that I've long assumed to be true: Nvidia receives "preferential allocation" on advanced process technology at TSMC. It's logical that Nvidia should get the red-carpet treatment, having been TSMC's best customer for many years, but I don't recall hearing Nvidia or TSMC put this fact on the record before.

The third notable point from White's presentation: the gross margins for Nvidia's Tegra, an ARM-based application processor--which Nvidia's Mike Rayfield, general manager of the Tegra division, says has already garnered 42 design wins at 27 companies--are much higher than I'd have guessed--at "over 45 percent." That's quite excellent for an ARM-based SoC; it's a very competitive market.

More surprises
The technical sessions at the event contained their own surprises.

For example, Nvidia effectively seized control of an old Intel marketing buzzword: "balanced."

For years, Intel used to talk about ... Read more


April 3, 2009 5:01 AM PDT

Sizing up new high-end machines from HP, Apple

by Peter Glaskowsky
  • 29 comments

Last week, I attended a press event in Los Angeles hosted by Hewlett-Packard's workstation business unit. Hewlett-Packard was preparing for this week's announcement of three new Z-series workstation models: the Z400, Z600, and Z800.

HP briefed the reporters and analysts with all the key details of the products (the speeds and feeds, as we say), took us to visit a couple of HP's key customers in the area, and hosted presentations by software partners and more customers.

The new HP Z-Series workstations.

The new HP Z-Series workstations.

(Credit: Hewlett-Packard)

The workstations are very nice, especially the Z600 and Z800: high-quality dual-processor systems based on Intel's newest Xeon 5500-series processors with specific adaptations to distinguish them from ordinary PCs. Even the Z400, though based on a more basic PC-like design, uses a single Xeon processor and provides two 16-lane PCI Express Gen2 slots.

The customer visits were well chosen: one at BMW Designworks and another at DreamWorks, the movie studio that just released Monsters vs. Aliens.

BMW Designworks actually assisted with the industrial design of the new HP workstations. They're handsome machines, but not exactly pretty--certainly not in the way Apple's Mac Pro is.

More importantly, however, the HP-BMW design is functionally superior. In about the same case size as the Mac Pro, HP's Z800 has room for more RAM, more expansion cards, and more disk drives. BMW also worked handles into the design, and they work better than Apple's.

The difference in RAM is quite substantial. It isn't just about the slots (eight in the Mac Pro, twelve in the Z800)--but even more in the fact that HP supports 16GB dual in-line memory modules (DIMMs), while Apple's machine goes only up to 4GB per slot. That's 192GB for the HP and 32GB for the Mac.

To be fair, HP is merely promising to offer 16GB DIMMs by the end of 2009; you can't get them today. Apple rarely preannounces anything, so it's possible that the Mac Pro will support more RAM by then, but HP's advantage in slot count should keep it on top.

More RAM can often give more performance than a faster CPU, especially in memory-hungry engineering applications. If the software overflows the physical memory and must start using virtual memory, performance can plummet.

These are very nice machines. But they're also expensive. The Z800 starts at less than $2,000 (actually a good bit cheaper than the Mac Pro's entry price), but most buyers will aim higher. In fact, it's no big deal to spend $10,000 or more on a high-end workstation.

Does that seem like a lot of money to spend on a PC for business use at a time when many businesses are struggling? Quite the opposite, I think.

The truth is, the cost of a superior PC is almost trivial, compared with the value it can generate in the hands of a highly skilled designer.

HP tried to make this point in its presentations at the event, but it was very conservative in its figures. First, it assumed that the total cost per employee (including salary, benefits, office space, management overhead, etc.) was just $60 per hour, which is very low. Second, it shouldn't have been using a cost model at all!

The more useful basis for this analysis is revenue per employee, which can easily exceed $250 per hour for the kind of workers who can make effective use of a high-price workstation.

For an employee generating this kind of value, a $10,000 workstation justifies its purchase remarkably quickly. Even if the employee's productivity improves just 10 percent, the payback period is a mere 10 weeks.

It's worth thinking about what it takes to generate a 10 percent improvement in overall productivity. It isn't just a matter of computer performance, but performance helps. These new HP workstations are much faster than the older models, due to the combination of the faster CPUs, faster and more RAM, and a new generation of professional graphics cards from Nvidia and Advanced Micro Devices' ATI.

Performance relates to productivity, in terms of how much time the user spends waiting for the computer, so that's what to look for. Assuming that the software is working as well as it can, and the user's work habits are reasonable, processing delays for engineering visualizations, animation previews, circuit simulations, and similar tasks can really add up.

So it's no surprise to me that there's still a market for pricey dual-processor workstations.

What does surprise me is that there aren't more companies trying to rebuild the market for super high-end workstations.

SGI, in its glory days, used to be able to sell some pretty amazing machines for professional users. I have an SGI Octane workstation that originally sold for over $50,000. That seems like crazy money, but even a $50,000 workstation in the right hands could still pay for itself in less than a year, a reasonable return on investment.

Alas, SGI went bankrupt again this week and then promptly sold itself to Rackable Systems for $25 million plus the assumption of SGI's debts.

I'm sad that SGI is gone, but it wasn't the workstation business that killed the company, and the numbers show that market niche still exists. HP could occupy that niche, if it chose, as could any company that makes four- and eight-processor servers, which share most of the same engineering issues.

Some small companies, such as Boxx Technologies (which I wrote about last summer in "Boxx fills in for a failing SGI") and HPC Systems, make bigger workstations, but both of these vendors' product lines are stuck with AMD Opteron processors at the moment, which are no longer performance-competitive with the new Xeons.

Later this year, new multiprocessor-capable Xeon processors will arrive that could reinvigorate the super-workstation market, and I hope that some of these companies step up to the challenge. I believe that there's some good money to be made there, and the rest of the world economy will benefit at the same time.

March 13, 2009 2:22 AM PDT

A 'post-x86 world'? Preposterous!

by Peter Glaskowsky
  • 40 comments

I honestly don't know whether Om Malik's blog site, GigaOM, is intended to be informative or merely entertaining. I pointed out a previous example of the overwrought rhetoric that permeates that site last September (in the context of Comcast's then-new usage cap policy), but generally, I try to ignore the nonsense there for the same reasons that I ignore talk radio.

But like it or not, GigaOM is widely read, and sometimes when a post there bears directly on a market that's important to me, I can't bear to let it go. This is one of those times.

On Thursday, a GigaOM staffer wrote a piece titled "Can Intel Thrive in a Post x86 World?"

A slide from Fred Weber's keynote presentation at Microprocessor Forum 2003

A slide from Fred Weber's keynote presentation at Microprocessor Forum 2003 showing how x86 will evolve into systems from big servers down to handheld consumer devices.

(Credit: Advanced Micro Devices, Inc.)

The headline is preposterous from beginning to end. It has two implications just in the eight words of the title: that Intel's ability to "thrive" faces any imminent threats, and that the importance of the x86 architecture is declining.

In January, the same staffer wrote a piece titled "Netbooks and the Death of x86 Computing" which reached the fantastic conclusion that Netbooks would "destroy the hegemony of x86 machines for personal computing."

Well, as I pointed out just a few weeks later (in "The Netbook is dead. Long live the notebook!"), when the Netbook phenomenon ran up against the dominance of Intel and Microsoft in the PC market, it was the Netbook that died instead. Even at a $300 price point, people still want full PC compatibility.

Yes, there are companies like Freescale (the subject of the January post on GigaOM) and Nvidia that are looking to push the ARM architecture into the Netbook space. But that idea never made much sense, and now that Intel and TSMC are working together to get Intel's Atom x86 core into lower-cost SoC (system on chip) products, the ARM architecture will eventually have to retreat into the shrinking niche for supersmall, supercheap phones and consumer electronics gizmos for which x86 compatibility is of negligible value.

See, we learned a long time ago--those of us who cover this industry professionally, not just as a random assignment for some random blog--that the instruction set architecture (ISA), per se, doesn't matter any more.

The choice of ISA was a big deal in the 1980s and early 1990s, when the extra complexity of an x86 instruction decoder was a large fraction of the total complexity of a microprocessor. That's where the conflict between RISC and CISC came from.

But by the turn of the century, ISA complexity was almost a dead issue, and that coffin's final nail was pounded in by the keynote speech of then-Advanced Micro Devices CTO Fred Weber at Microprocessor Forum 2003, an event I had the honor of hosting.

In his talk, "Towards Instruction Set Consolidation," Weber made a simple point: "Technology has passed the point where instruction set costs are at all relevant."

Even then, three generations of process technology ago, the "x86 penalty" was down to a couple square millimeters of silicon. Today, the comparable figure is about 0.25 square millimeters. Not zero, certainly, but not a significant concern for chips that are a hundred times larger.

In short, ARM chips aren't cheaper or more power-efficient because of their instruction sets; they're like that because they're designed to be. And anything that an ARM chip can do to save cost or power can also be done by an x86 chip.

So there can't ever be a time when the world moves beyond x86. That's 1980s thinking, just plain ignorance of what may be the most important trend in the microprocessor industry.

The rest of Thursday's GigaOM post is a hopelessly self-contradictory muddle that fails to reach any clear conclusions. I'll just quote one more line near the end: "But the PC will be just one small (and shrinking) battleground to keep x86 relevant, amid a more mobile, visual, and power-sensitive world."

Current economic woes aside, the PC market is hardly shrinking. You know what's shrinking? The PC! As the PC shrinks, the PC market will grow. The MID (mobile Internet device) market isn't much to speak of right now, for example, but once MID makers figure out what to build, MIDs will become more popular.

And seriously, is anyone really not clear on the fact that the Apple iPhone is a computer? It isn't an embedded system. An embedded system is one in which the presence of a microprocessor is functionally irrelevant to the user. When a gizmo exposes its programmability to the user, it's a computer.

What else is the App Store but the visible manifestation of the iPhone's programmability?

Now, ARM isn't dead yet. The iPhone uses an ARM processor because there's no x86 processor that would work as well in that system. ARM processors will probably see at least two more generations in cell phones just because there's so much ARM-based software out there (including all the software on the App Store).

But somewhere around 2012, we're going to see x86 chips poking into that space. The value of instruction set compatibility with the PC market will persuade developers of new cell phone platforms to go with x86 chips, and eventually even established systems like the iPhone will switch over.

So not only are x86 chips selling into a growing PC market, they'll eventually start eating into ARM's own strongholds. That can't be bad for Intel.

And that's why the GigaOM piece was preposterous.

August 29, 2008 5:01 AM PDT

Boxx fills in for a failing SGI

by Peter Glaskowsky
  • 5 comments

I miss the old SGI. Silicon Graphics was widely regarded as the greatest computer company in Silicon Valley back in the 1990s. Sometimes forgotten--but not gone--SGI was one of our greatest success stories and one of our greatest tragedies.

Boxx Technologies logo (Credit: Boxx Technologies)

Apple may have had more revenue by virtue of shipping millions of small systems, but SGI's hardware spanned the range from video-game consoles (the Nintendo 64) to workstations to supercomputers. SGI's Unix-based operating system, IRIX, was one of the most sophisticated in the industry.

I used to lust over SGI machines. I'd obsess over lists of used SGI gear, looking for a great deal that would let me have my own IRIX box at home. In 2004, I finally bought an Octane with MXI graphics... but that was years after these machines were effectively obsolete, and I paid less than 0.5% (1/200th!) of the original retail price of the machine.

In the mid-to-late 1990s, SGI was not well managed, losing huge amounts of money because its leaders would not... Read more

August 24, 2008 12:30 PM PDT

Larrabee performance--beyond the sound bite

by Peter Glaskowsky
  • 3 comments

Hello, Slashdot.

In a story on PC Pro, Nvidia architect John Montrym (whose name was incorrectly spelled "Mottram") quoted my recent blog post on Larrabee as concluding that "the 'large' Larrabee in 2010 will have roughly the same performance as a 2006 GPU from Nvidia or ATI."

Alas, this isn't really what I said or meant.

What I actually described as equating to "the performance of a 2006-vintage...graphics chip" was a performance standard defined by Intel itself--running the game F.E.A.R. at 60 fps in 1,600 x 1,200-pixel resolution with four-sample antialiasing.

Intel used this figure for some comparisons of rendering performance. If Larrabee ran at 1GHz, for example, Intel's figures show that... Read more

August 5, 2008 1:30 AM PDT

Intel's Larrabee--more and less than meets the eye

by Peter Glaskowsky
  • 13 comments

Intel announced on Monday that it will be presenting a paper at Siggraph 2008 about its "many-core" Larrabee architecture, which will be the basis of future Intel graphics processors.

The paper itself, however, has already been published, and I was able to get a copy of it. (Unfortunately, as you'll see at that link, the paper is normally available only to members of the Association for Computing Machinery.)

Larrabee block diagram

Intel's Larrabee includes "many" cores, on-chip memory controllers, a wide ring bus for on-chip communications, and a small amount of graphics-specific logic.

(Credit: Intel)

The paper is a pretty thorough summary of Intel's motives for developing Larrabee and the major features of the new architecture. Basically, Larrabee is about using many simple x86 cores--more than you'd see in the central processor (CPU) of the system--to implement a graphics processor (GPU). This concept has received a lot of attention since Intel first started talking about it last year.

... Read more

advertisement

15 sites that went kaput in 2009

Web sites launch all the time, but they also shut their doors. We highlight 15 that bit the dust this year.

Top 10 news stories of the decade

Let the debate begin: Was the iPhone more important than iTunes? Was anything bigger than Google finding a great business model? CNET offers its list of the 10 most important stories of the '00s.

About Speeds and Feeds

Silicon Valley-based computer architect and chip analyst Peter N. Glaskowsky attends a variety of industry conferences throughout the year to meet with industry thought leaders and dig into the future of computing technology. In Speeds and Feeds, he analyzes trends in system architecture and interface design, as well as market and political pressures surrounding those trends. He is a member of the CNET Blog Network and is not an employee of CNET. Disclosure.

Add this feed to your online news reader

Speeds and Feeds topics

Most Discussed

advertisement

Inside CNET News

Scroll Left Scroll Right