Nvidia and Advanced Micro Devices' ATI division are taking different approaches to graphics processing in the next generations of their products. Both strategies have strengths and weaknesses, and I think it's too soon to pick the eventual winner in this long-running fight.
Before I get into my analysis, I should say that Nvidia paid me to write a white paper on the implications of its new GPU architecture (code-named Fermi) for high-performance computing applications. The white paper was released as part of the Fermi launch event at Nvidia's GPU Technology Conference last week.
Nvidia also paid for white papers from two other well-known microprocessor analysts, Nathan Brookwood of Insight64 and my friend and former colleague Tom Halfhill of Microprocessor Report. UC Berkeley professor David Patterson wrote a fourth white paper, and Nvidia wrote one of its own. All of these works take a different approach to the subject; all are worth reading if you need to understand what Fermi is all about.
In short, I think the Fermi architecture has been more thoroughly white-papered than any graphics chip design in history. All five of these documents are available on the Fermi home page on Nvidia's Web site, and just in case that page is moved or changed, you're welcome to take advantage of my own mirror of my white paper.
I've spent much of the last several days reading these documents plus David Kanter's excellent article on Fermi over on his Real World Technologies site. David managed to get some details on Fermi that Nvidia didn't give to the rest of us.
I've also had time to go through the coverage of ATI's recent launch of the RV870, which is what Nvidia's Fermi-based chips will be competing against. The first of Nvidia's chips bears the internal code name of GF100, and it's huge. Here's a life-size photo:
... Read moreI spent Tuesday at Nvidia headquarters, attending the company's annual Analyst Day.
I've been to most of Nvidia's analyst events over the last decade or so, since I covered Nvidia almost from its inception while working as the graphics analyst at Microprocessor Report. These meetings are always a good way to get an update on the company's business operations, and sometimes--like this time--one provides exceptionally good insight into larger industry trends.
Nvidia's GeForce GTX 280 graphics chip
(Credit: Nvidia)Nvidia has had a rough couple of quarters in the market, which CEO Jen-Hsun Huang blamed in part on a bad strategic call in early 2008: to place orders for large quantities of new chips to be delivered later in the year. When the recession hit, these orders turned into about six months of inventory, much of which simply couldn't be sold at the usual markup.
In response, Nvidia CFO David White outlined measures the company plans to take to increase revenue, sell a more valuable mix of products, reduce the cost of goods sold, and cut back on Nvidia's operating expenses.
Three things stood out for me in this presentation:
Nvidia is planning an aggressive transition to state-of-the-art ASIC fabrication technology at TSMC, the company's manufacturing partner. Within "two to three quarters," White said, about two-thirds of the chips Nvidia sells will be made using 40-nanometer process technology. (The first of these chips were announced Tuesday.)
White also acknowledged something that I've long assumed to be true: Nvidia receives "preferential allocation" on advanced process technology at TSMC. It's logical that Nvidia should get the red-carpet treatment, having been TSMC's best customer for many years, but I don't recall hearing Nvidia or TSMC put this fact on the record before.
The third notable point from White's presentation: the gross margins for Nvidia's Tegra, an ARM-based application processor--which Nvidia's Mike Rayfield, general manager of the Tegra division, says has already garnered 42 design wins at 27 companies--are much higher than I'd have guessed--at "over 45 percent." That's quite excellent for an ARM-based SoC; it's a very competitive market.
More surprises
The technical sessions at the event contained their own surprises.
For example, Nvidia effectively seized control of an old Intel marketing buzzword: "balanced."
For years, Intel used to talk about ... Read more
Last week, I attended a press event in Los Angeles hosted by Hewlett-Packard's workstation business unit. Hewlett-Packard was preparing for this week's announcement of three new Z-series workstation models: the Z400, Z600, and Z800.
HP briefed the reporters and analysts with all the key details of the products (the speeds and feeds, as we say), took us to visit a couple of HP's key customers in the area, and hosted presentations by software partners and more customers.
The new HP Z-Series workstations.
(Credit: Hewlett-Packard)The workstations are very nice, especially the Z600 and Z800: high-quality dual-processor systems based on Intel's newest Xeon 5500-series processors with specific adaptations to distinguish them from ordinary PCs. Even the Z400, though based on a more basic PC-like design, uses a single Xeon processor and provides two 16-lane PCI Express Gen2 slots.
The customer visits were well chosen: one at BMW Designworks and another at DreamWorks, the movie studio that just released Monsters vs. Aliens.
BMW Designworks actually assisted with the industrial design of the new HP workstations. They're handsome machines, but not exactly pretty--certainly not in the way Apple's Mac Pro is.
More importantly, however, the HP-BMW design is functionally superior. In about the same case size as the Mac Pro, HP's Z800 has room for more RAM, more expansion cards, and more disk drives. BMW also worked handles into the design, and they work better than Apple's.
The difference in RAM is quite substantial. It isn't just about the slots (eight in the Mac Pro, twelve in the Z800)--but even more in the fact that HP supports 16GB dual in-line memory modules (DIMMs), while Apple's machine goes only up to 4GB per slot. That's 192GB for the HP and 32GB for the Mac.
To be fair, HP is merely promising to offer 16GB DIMMs by the end of 2009; you can't get them today. Apple rarely preannounces anything, so it's possible that the Mac Pro will support more RAM by then, but HP's advantage in slot count should keep it on top.
More RAM can often give more performance than a faster CPU, especially in memory-hungry engineering applications. If the software overflows the physical memory and must start using virtual memory, performance can plummet.
These are very nice machines. But they're also expensive. The Z800 starts at less than $2,000 (actually a good bit cheaper than the Mac Pro's entry price), but most buyers will aim higher. In fact, it's no big deal to spend $10,000 or more on a high-end workstation.
Does that seem like a lot of money to spend on a PC for business use at a time when many businesses are struggling? Quite the opposite, I think.
The truth is, the cost of a superior PC is almost trivial, compared with the value it can generate in the hands of a highly skilled designer.
HP tried to make this point in its presentations at the event, but it was very conservative in its figures. First, it assumed that the total cost per employee (including salary, benefits, office space, management overhead, etc.) was just $60 per hour, which is very low. Second, it shouldn't have been using a cost model at all!
The more useful basis for this analysis is revenue per employee, which can easily exceed $250 per hour for the kind of workers who can make effective use of a high-price workstation.
For an employee generating this kind of value, a $10,000 workstation justifies its purchase remarkably quickly. Even if the employee's productivity improves just 10 percent, the payback period is a mere 10 weeks.
It's worth thinking about what it takes to generate a 10 percent improvement in overall productivity. It isn't just a matter of computer performance, but performance helps. These new HP workstations are much faster than the older models, due to the combination of the faster CPUs, faster and more RAM, and a new generation of professional graphics cards from Nvidia and Advanced Micro Devices' ATI.
Performance relates to productivity, in terms of how much time the user spends waiting for the computer, so that's what to look for. Assuming that the software is working as well as it can, and the user's work habits are reasonable, processing delays for engineering visualizations, animation previews, circuit simulations, and similar tasks can really add up.
So it's no surprise to me that there's still a market for pricey dual-processor workstations.
What does surprise me is that there aren't more companies trying to rebuild the market for super high-end workstations.
SGI, in its glory days, used to be able to sell some pretty amazing machines for professional users. I have an SGI Octane workstation that originally sold for over $50,000. That seems like crazy money, but even a $50,000 workstation in the right hands could still pay for itself in less than a year, a reasonable return on investment.
Alas, SGI went bankrupt again this week and then promptly sold itself to Rackable Systems for $25 million plus the assumption of SGI's debts.
I'm sad that SGI is gone, but it wasn't the workstation business that killed the company, and the numbers show that market niche still exists. HP could occupy that niche, if it chose, as could any company that makes four- and eight-processor servers, which share most of the same engineering issues.
Some small companies, such as Boxx Technologies (which I wrote about last summer in "Boxx fills in for a failing SGI") and HPC Systems, make bigger workstations, but both of these vendors' product lines are stuck with AMD Opteron processors at the moment, which are no longer performance-competitive with the new Xeons.
Later this year, new multiprocessor-capable Xeon processors will arrive that could reinvigorate the super-workstation market, and I hope that some of these companies step up to the challenge. I believe that there's some good money to be made there, and the rest of the world economy will benefit at the same time.
As has been widely reported (for example, by EDN Magazine and both Brooke Crothers and Dan Ackerman here at CNET), Intel has delayed the first customer shipments (FCS) of its "Montevina" chipsets, part of the new Centrino 2 platform.
The delays are pretty short, however... a matter of just a few weeks.
Intel attributes the delays to two independent problems: one with FCC certification of the 802.11n WiFi feature in the chips (just "paperwork," Intel says), and one with the integrated graphics engines in some models.
Intel's probably right about the WiFi certification problem. I've been through the FCC certification process (for electromagnetic interference (EMI), at least); there sure is a lot of paperwork involved.
For the graphics problem, I see a couple of possible explanations.
Intel could have discovered a design flaw in the first production units severe enough to prevent them from being shipped, which would have caused a substantial delay while a new run of production units was completed. (See my earlier blog post, "Design flaws, defects, and faults", for an explanation of how design flaws are related to product defects and faults.) This delay would have been largely hidden by the usual rounds of testing, but perhaps it just used up a little more time than the slack that was available in the schedule.
Or perhaps there was a design or manufacturing flaw that didn't require trashing the first production run, but which did require some additional testing and qualification to reject specific problematic parts. This could be caused by slower or hotter operation than expected, for example. Such a problem would cause a shorter delay-- just the extra testing time. A statement from Intel in the Crothers post referring to "re-screening" suggests this is the situation here, although potentially that statement could also describe testing a second production run to ensure the problem has been solved.
I find it interesting that this problem is related to Intel's new graphics engine, which is certainly the most important element of the new chipset. Intel's previous integrated graphics products have been criticized for not really being up to the challenges of running Windows Vista, including by Microsoft itself, but due to pressure from Intel, Microsoft certified these chips as "Vista Capable." That's technically true-- I've used integrated-graphics platforms under Vista myself-- but the resulting shortfalls in performance and features probably discouraged many new Vista users.
Graphics engines are very complicated, and getting more complicated every year. Intel started out well enough in the graphics business when it worked with Real3D (now defunct) to develop the Intel740, a discrete graphics chip, but 18 months later it found itself already 18 months behind ATI and NVIDIA, and fell back to selling only integrated-graphics chipsets, where the graphics component is worth only a few dollars in incremental revenue.
Intel plans to get back into the market for discrete graphics chips in 2009 or (more likely) 2010 with "Larrabee", a multi-core CPU in which some cores are optimized for graphics processing. I think Larrabee will turn out to be a technical disaster, but Intel has leveraged its market domination to turn previous technical disasters into financial windfalls. Think of the Pentium 4's "Hyper-Pipelined" design, for example, which was too hot and too inefficient, ultimately forcing Intel to bring its predecessor, the P6 design, back from the grave several years later. Intel's current graphics engines, however, are barely worth selling today, and they won't be worth reviving after Larrabee has run its course.
- prev
- 1
- next






