Nvidia and Advanced Micro Devices' ATI division are taking different approaches to graphics processing in the next generations of their products. Both strategies have strengths and weaknesses, and I think it's too soon to pick the eventual winner in this long-running fight.
Before I get into my analysis, I should say that Nvidia paid me to write a white paper on the implications of its new GPU architecture (code-named Fermi) for high-performance computing applications. The white paper was released as part of the Fermi launch event at Nvidia's GPU Technology Conference last week.
Nvidia also paid for white papers from two other well-known microprocessor analysts, Nathan Brookwood of Insight64 and my friend and former colleague Tom Halfhill of Microprocessor Report. UC Berkeley professor David Patterson wrote a fourth white paper, and Nvidia wrote one of its own. All of these works take a different approach to the subject; all are worth reading if you need to understand what Fermi is all about.
In short, I think the Fermi architecture has been more thoroughly white-papered than any graphics chip design in history. All five of these documents are available on the Fermi home page on Nvidia's Web site, and just in case that page is moved or changed, you're welcome to take advantage of my own mirror of my white paper.
I've spent much of the last several days reading these documents plus David Kanter's excellent article on Fermi over on his Real World Technologies site. David managed to get some details on Fermi that Nvidia didn't give to the rest of us.
I've also had time to go through the coverage of ATI's recent launch of the RV870, which is what Nvidia's Fermi-based chips will be competing against. The first of Nvidia's chips bears the internal code name of GF100, and it's huge. Here's a life-size photo:
Just kidding, of course. The actual chip is much smaller! But it does contain 2.9 billion transistors, which is a pretty amazing figure for a relatively affordable consumer product. Intel doesn't make a CPU with that many transistors. Intel's 8-core Nehalem EX, due out next year, has only 2.3 billion.
ATI's RV870 is more modestly sized at 2.2 billion transistors, but it has more processing cores: 1,600 vs. the GF100's 512. On the other hand, Nvidia spent some of its transistor budget boosting the clock rate of the GF100 to around twice that of the RV870. Nvidia hasn't released any numbers yet, but Brookwood estimates the GF100 cores will run at about 1.5 GHz; ATI says the RV870 will be clocked at 850 MHz.
If Brookwood is right, the theoretical maximum throughput of the GF100 on single-precision floating-point operations works out to 1.536 GFLOPS, whereas ATI announced the figure of 2.72 GFLOPS for the RV870.
On double-precision data, the ranking is reversed. Nvidia would have 768 GFLOPS, and ATI, 544 GFLOPS. Nvidia may also have advantages in memory bandwidth, since its design has more memory close to the cores and six channels of GDDR5 memory rather than ATI's four, and ATI's programming model may lead to slightly lower overall efficiency.
While we wait for more data from Nvidia, not to mention real-world benchmark results, that's about as far as we can take the technical comparisons.
But even these scant data shed enough light to show some interesting contrasts between the two companies' respective chips.
To me, it looks like ATI remains tightly focused on consumer 3D applications, primarily games. ATI's FirePro line of professional graphics cards has been growing in popularity, but it still represents small change for the company and I don't see that changing dramatically with the RV870.
Nvidia also makes most of its money from gamers--more money than ATI does--but the design choices embodied in the Fermi architecture show how aggressively Nvidia is working to seize control of the nascent market for GPU-based computing.
ATI and Nvidia started out roughly equal in this market when so-called general-purpose GPU (GPGPU) computing got started. I hosted a panel at the GP2 Workshop over five years ago, and nobody had any real idea how the market would develop.
Since then, Nvidia's Tesla products have become the dominant GPU computing hardware platform and Nvidia's CUDA architecture has become the dominant software platform. Even the other two major software platforms (OpenCL and Microsoft's Direct Compute) are based on CUDA in significant ways.
These developments do not bode well for ATI, or for AMD, since AMD has made Fusion--the integration of CPU and GPU computing--the central element of its long-term strategy. AMD's saving grace here is that Intel is actively resisting this kind of integration, persisting in the belief that an enhanced CPU such as Larrabee can entirely eliminate the need for GPUs.
But that's the long run. In the short run, ATI has to build demand for its graphics chips by making them better graphics chips, and the RV870 certainly does that. For one thing, a graphics chip you can actually buy is infinitely better than one you can't, and ATI has a critical schedule advantage over Nvidia.
The GF100 won't be out in time for the holiday buying season, but ATI's already shipping the RV870. As long as ATI can deliver its new chip in volume, it should do very well over the next few months. Nvidia doesn't seem concerned, pointing to its recent history of market-share leadership, but gamers can be fickle. They tend to buy what's best when they're ready to buy, rather than waiting for something potentially better to come along.
Nvidia may be betting on GPU computing for future growth, but like ATI, it needs sales today to support product development tomorrow. I'm looking for better models of Nvidia's current GT200 line of graphics chips to maintain market share through the end of the year, and for the GF100 to reach customers as soon as possible thereafter.