Speeds and Feeds

Read all 'GPU' posts in Speeds and Feeds
October 14, 2009 5:55 AM PDT

The factor factor, part 3

by Peter Glaskowsky
  • 8 comments

In part 1 and part 2 of this series, I claimed that there is apparently a secret rule in the microprocessor industry that determines the success--or failure--of new chip designs.

The failures included RISC processors, media processors, and intelligent RAM chips, which all sank in spite of clearly demonstrable advantages over alternative solutions. The great success is the programmable graphics processing unit (GPU), which has succeeded in spite of the sometimes wrenching shifts in programming methods and PC system architecture that have been required to support it.

So what's the secret? Simply this: a factor-of-two advantage, even if it's an inherent, persistent advantage, isn't enough to unseat an incumbent solution in the face of even the mildest competitive disadvantage. Without a factor of 10--a full order of magnitude--a new product won't even get a foot in the door.

That's why I call this rule the "factor factor." It isn't enough to be a few times faster than the existing alternatives. Given the performance consequences of Moore's Law, it's easier for your potential customers to wait a few years rather than spend a few years adapting to your "issues." You need be much faster than the products you're trying to replace. The target factor is 10--no less.

Sometimes, even a tenfold advantage isn't enough. One order of magnitude is enough to overcome one disadvantage, such as a change of programming methods. Add another simultaneous disadvantage, however, like the serious constraint in local memory capacity imposed by the IRAM concept, and the new technology may need a factor of 100 in performance to win a place in the market.

Overall, a new product must deliver net benefits amounting to as much as a full order of magnitude in cost, performance, or productivity to compensate for each significant disadvantage. That's just what it takes to motivate customers to deal with the problems rather than waiting for Moore's Law to speed up the solutions that are already familiar to them.

The introduction of the AMD64 instruction set by Advanced Micro Devices (also known as EM64T or "Intel 64" on Intel processors, or generically as x86-64) represents the ultimate success case for the factor factor.

Athlon 64 processor

AMD's Athlon 64 debuted the AMD64 instruction-set architecture.

(Credit: Advanced Micro Devices)

This isn't immediately clear, I suppose. Adopting the AMD64 standard required a lot of work by operating system vendors and software developers, and the performance benefit was relatively mild in most cases. But still, AMD64 was an immediate success because the performance benefit in certain applications--those that simply wouldn't fit into a 32-bit address space--was practically infinite.

Although the factor factor seems obvious--or at least it should--it's still at the heart of many failed products and hundreds of millions of dollars of wasted investments every year.

In Silicon Valley, like other chip-design centers around the world, projects rarely fail because of poor execution. In most projects, the engineers are good at their jobs, the managers are good at coordinating their work, and the investment is sufficient to get the work done.

Most projects fail at the conceptual level, before the detail design work even begins. The factor factor is only one of many reasons for these failures, of course, but it's the one that disturbs me the most because it's the easiest to anticipate.

This rule doesn't apply to all products. When a new chip for an existing market is architecturally compatible with previous products, a factor-of-two performance improvement is plenty. Even smaller benefits can justify the costs of developing a new product if there are few, if any, disadvantages associated with it.

Multicore CPUs are one of these products, at least for now. Process technology makes it pretty easy to double core counts. Dual-core CPUs were almost a drop-in replacement for single-core chips and caused no serious problems. Quad-core chips were the same thing again. Eight-core CPUs may be a lesson in diminishing returns, but I'm sure they'll be commercially successful.

Beyond that, we'll have to see how it goes. The critical advantage of the CPU over the GPU is high performance on inherently serial processing tasks (what we sometimes call "single-threaded applications"). On a typical PC, there's rarely more than a few of these tasks running at any given moment. It's always useful to have a few extra cores available for parallel tasks, but at some point (I'm thinking somewhere around the 16-core level), PC buyers are likely to stop paying extra for more extra cores.

Even mighty Intel could find itself on the wrong side of the factor factor. Given that quad-core chips became a mainstream product just this year, we can expect to see 16-core processors for ordinary desktop PCs in 2013 and laptops in 2015 or so. By that time, the GPU could be the incumbent solution for high-performance parallel processing, and multicore CPUs could be the technology looking for compelling performance advantages.

So...now you know the supposed secret. When you hear about a radical new microprocessor architecture, you can do what I do: imagine the numeral "1" followed by a "0" for each drawback you see in the proposal. Compare that figure with the claimed benefits and you'll know which way to bet.

By the way, kudos to CNET users divisionbyzero and TrinityTrident, who proved my point that this rule isn't really a secret by explaining it on their comments to the previous posts in this three-part series.

Now if someone could only explain why so many companies don't seem to know this rule!

October 13, 2009 8:01 AM PDT

The factor factor, part 2

by Peter Glaskowsky
  • 7 comments

In the first part of this series, I claimed that a great secret in the microprocessor industry largely determines whether new products succeed or fail.

I noted that this secret shouldn't be a secret at all because many people (including myself) have talked about it over the years, but clearly a lot of people are in the dark because they continually disregard it and develop products that are doomed.

I gave several examples of products that failed because their creators didn't know the great secret. Those products included RISC processors, media processors, and intelligent RAM chips, in which processor cores were integrated with memory to eliminate one of the great bottlenecks in computer performance.

During my eight years at Microprocessor Report, I covered the markets for media processors, 3D-graphics chips, network processors, and what I coined extreme processors--chips with large numbers of simple cores running in parallel. Many of these chips were cheaper, easier to design, and twice as fast as competing products--and still failed.

However, some did succeed. The critical factor that made the difference in most of these cases is the essence of the so-called secret.

One of those successes is the graphics processing unit, or GPU.

I was reminded again of the secret at Nvidia's recent GPU Technology Conference, where many of the talks dealt with GPU computing.

(Disclosure: I recently wrote a technical white paper for Nvidia.)

Although the GPU field dates back only five or six years, GPUs have already earned a place alongside CPUs. Each is clearly superior for certain kinds of applications.

This is true in spite of the fact that GPUs aren't nearly as easy to program as CPUs. Like other forms of parallel programming, GPU programming requires new hardware (the GPU itself), significant new extensions for programming languages, and a different mindset for programmers--one that simply wasn't part of standard computer-science curriculum for most of the last 50 years.

... Read more

October 7, 2009 8:07 AM PDT

ATI and Nvidia face off--obliquely

by Peter Glaskowsky
  • 8 comments

Nvidia and Advanced Micro Devices' ATI division are taking different approaches to graphics processing in the next generations of their products. Both strategies have strengths and weaknesses, and I think it's too soon to pick the eventual winner in this long-running fight.

Before I get into my analysis, I should say that Nvidia paid me to write a white paper on the implications of its new GPU architecture (code-named Fermi) for high-performance computing applications. The white paper was released as part of the Fermi launch event at Nvidia's GPU Technology Conference last week.

Nvidia also paid for white papers from two other well-known microprocessor analysts, Nathan Brookwood of Insight64 and my friend and former colleague Tom Halfhill of Microprocessor Report. UC Berkeley professor David Patterson wrote a fourth white paper, and Nvidia wrote one of its own. All of these works take a different approach to the subject; all are worth reading if you need to understand what Fermi is all about.

In short, I think the Fermi architecture has been more thoroughly white-papered than any graphics chip design in history. All five of these documents are available on the Fermi home page on Nvidia's Web site, and just in case that page is moved or changed, you're welcome to take advantage of my own mirror of my white paper.

I've spent much of the last several days reading these documents plus David Kanter's excellent article on Fermi over on his Real World Technologies site. David managed to get some details on Fermi that Nvidia didn't give to the rest of us.

I've also had time to go through the coverage of ATI's recent launch of the RV870, which is what Nvidia's Fermi-based chips will be competing against. The first of Nvidia's chips bears the internal code name of GF100, and it's huge. Here's a life-size photo:

... Read more
August 10, 2009 5:01 AM PDT

A new view of 3D graphics

by Peter Glaskowsky
  • Post a comment

Have we reached the end of the road for conventional 3D rendering?

Siggraph 2009 ended Friday, and I've spent the last few days digesting what I learned there. Although I've been involved in the graphics industry since 1990 and I've attended Siggraph most years since 1992, a crisis of sorts seems to have snuck up on me.

HPG 2009 logo

At the High Performance Graphics conference before the main show, keynote speeches from Larry Gritz of Sony Pictures Imageworks and Tim Sweeney of Epic Games showed that traditional 3D-rendering methods are being augmented and even supplanted by new techniques for motion-picture production as well as real-time computer games.

Gritz reckoned that 3D became a fully integrated element of the moviemaking process in 1989 when computer-generated characters first interacted with human characters in James Cameron's "The Abyss."

Gritz described how Imageworks has moved to a new ray-tracing rendering system called "Arnold" for several films currently in production, replacing the Reyes (Render Everything Your Eyes See) rendering system, probably the most widely used technology in the industry.

According to Gritz, Reyes rendering led to unmanageable complexity in the artistic component of the production process, outweighing the render-time advantages of the Reyes method. But Gritz says even these advantages diminished as the demand for higher quality drove Imageworks to make more use of ray tracing and a sophisticated lighting model called global illumination.

The bottom line for Imageworks is that Arnold, which was licensed from Marcos Fajardo of Solid Angle, takes longer to do the final rendering, but is easier on the artists and makes it easier to create the models and lighting effects--a net win.

Sweeney echoed this theme the next day, which surprised me considering Sweeney's focus is real-time rendering for 3D games--notably with Epic's Unreal Engine, which has been used in hundreds of 3D games on all the major platforms. Game rendering uses far less sophisticated techniques because each frame has to be rendered in perhaps one-sixtieth of a second, not the four or five hours on average that can be devoted to a single frame of a motion picture.

It seems that Sweeney is also ... Read more

June 17, 2009 5:01 AM PDT

GPUs and the new 'digital divide'

by Peter Glaskowsky
  • 5 comments

I spent Tuesday at Nvidia headquarters, attending the company's annual Analyst Day.

I've been to most of Nvidia's analyst events over the last decade or so, since I covered Nvidia almost from its inception while working as the graphics analyst at Microprocessor Report. These meetings are always a good way to get an update on the company's business operations, and sometimes--like this time--one provides exceptionally good insight into larger industry trends.

Nvidia's GeForce GTX 280 graphics chip

Nvidia's GeForce GTX 280 graphics chip

(Credit: Nvidia)

Nvidia has had a rough couple of quarters in the market, which CEO Jen-Hsun Huang blamed in part on a bad strategic call in early 2008: to place orders for large quantities of new chips to be delivered later in the year. When the recession hit, these orders turned into about six months of inventory, much of which simply couldn't be sold at the usual markup.

In response, Nvidia CFO David White outlined measures the company plans to take to increase revenue, sell a more valuable mix of products, reduce the cost of goods sold, and cut back on Nvidia's operating expenses.

Three things stood out for me in this presentation:

Nvidia is planning an aggressive transition to state-of-the-art ASIC fabrication technology at TSMC, the company's manufacturing partner. Within "two to three quarters," White said, about two-thirds of the chips Nvidia sells will be made using 40-nanometer process technology. (The first of these chips were announced Tuesday.)

White also acknowledged something that I've long assumed to be true: Nvidia receives "preferential allocation" on advanced process technology at TSMC. It's logical that Nvidia should get the red-carpet treatment, having been TSMC's best customer for many years, but I don't recall hearing Nvidia or TSMC put this fact on the record before.

The third notable point from White's presentation: the gross margins for Nvidia's Tegra, an ARM-based application processor--which Nvidia's Mike Rayfield, general manager of the Tegra division, says has already garnered 42 design wins at 27 companies--are much higher than I'd have guessed--at "over 45 percent." That's quite excellent for an ARM-based SoC; it's a very competitive market.

More surprises
The technical sessions at the event contained their own surprises.

For example, Nvidia effectively seized control of an old Intel marketing buzzword: "balanced."

For years, Intel used to talk about ... Read more


June 16, 2008 6:01 AM PDT

The Gizmo Report: NVIDIA's GeForce GTX 280 GPU-- introduction

by Peter Glaskowsky
  • Post a comment

Today, NVIDIA officially announces its new GeForce GTX 200 family of graphics processing units (GPUs) and the first two products in the family, the GeForce GTX 280 and the GeForce GTX 260.

NVIDIA's GeForce GTX 280 graphics chip

NVIDIA's GeForce GTX 280 graphics chip

(Credit: NVIDIA Corporation)

The GeForce GTX 280 is the new flagship of NVIDIA's GPU product line, taking over from last year's GeForce 9800 GTX. (The change in the product-name format from "9800 GTX" to "GTX 280" is potentially confusing and doesn't seem that useful to me, but I'm sure we'll get used to it over time. I suppose NVIDIA's other choice was to go with numbers above 10,000, which might have been even worse.)

NVIDIA disclosed the details of these products at an Editor's Day conference in May, and most of the attendees, including myself, received GTX 280 graphics cards for editorial review. These cards are NVIDIA reference boards, not retail products.

I'll be doing this review in multiple parts, each addressing a different aspect of these products and the effects they'll have on the PC graphics market.

First, an overview of the GTX 280 chip itself.

This is a huge chip. NVIDIA won't say exactly how large, and I'm not going to bust open the chip package on my reference board just to find out, but NVIDIA VP of technical marketing Tony Tamasi says ... Read more

June 16, 2008 6:01 AM PDT

The Gizmo Report: NVIDIA's GeForce GTX 280 GPU-- gaming

by Peter Glaskowsky
  • Post a comment

Graphics performance improves rapidly. We can be confident that each new generation of graphics chips will be faster than the previous one, and that AMD and NVIDIA will regularly surpass each other with new product launches. I've been watching this process professionally since 1996, when I began covering graphics technology for Microprocessor Report.

NVIDIA's GeForce GTX 280 graphics chip

NVIDIA's GeForce GTX 280 graphics chip

(Credit: NVIDIA Corporation)

As of today, NVIDIA is on top. The new GeForce GTX 280 is the fastest graphics chip you can get. See the first part of this review for details of the chip itself.

If you can get one, anyway. NVIDIA says boards based on the GeForce GTX 280 and its companion GeForce GTX 260 will be available "in quantity" tomorrow (June 17), but if previous launches are any indication, those quantities won't be enough to satisfy everyone.

And you may not be able to afford one-- a GTX 280 board with 1GB of RAM will likely be priced around $649, while GTX 260 boards with 896MB will go for about $399. (The GTX 280 / 1GB board I tested was made by NVIDIA, so it isn't necessarily representative of commercial products.)

But avid gamers won't be discouraged by these prices. Both AMD and NVIDIA like to point out that an expensive graphics card is a much better investment than a high-end CPU or motherboard if you care about gaming.

The standard of comparison for gaming performance is the number of frames per second that can be rendered for a given combination of screen resolution and quality features... or, conversely, what resolution and features can be used without reducing the frame rate below a playable level.

So in my own testing, I used frame rate as a metric for games that could run acceptably with maximum quality at the maximum resolution of my monitor (1,600 x 1,200 pixels), and quality for other games.

I did my testing with four games: ... Read more

August 16, 2007 5:01 AM PDT

Post-Siggraph book review: "GPU Gems 3"

by Peter Glaskowsky
  • Post a comment

As I described in my recent blog entries about Siggraph 2007, there's a lot of cool stuff going on in hardware and software development for graphics processors (GPUs).

GPUs are programmable devices like the more familiar CPUs, but the programing model is very different. A CPU core implements a simple linear model; programs consist of one instruction after another, though a good CPU scans the instruction stream for opportunities to execute a few instructions in parallel. A busy GPU, on the other hand, always ... Read more

  • prev
  • 1
  • next
advertisement

15 sites that went kaput in 2009

Web sites launch all the time, but they also shut their doors. We highlight 15 that bit the dust this year.

Top 10 news stories of the decade

Let the debate begin: Was the iPhone more important than iTunes? Was anything bigger than Google finding a great business model? CNET offers its list of the 10 most important stories of the '00s.

About Speeds and Feeds

Silicon Valley-based computer architect and chip analyst Peter N. Glaskowsky attends a variety of industry conferences throughout the year to meet with industry thought leaders and dig into the future of computing technology. In Speeds and Feeds, he analyzes trends in system architecture and interface design, as well as market and political pressures surrounding those trends. He is a member of the CNET Blog Network and is not an employee of CNET. Disclosure.

Add this feed to your online news reader

Speeds and Feeds topics

Most Discussed

advertisement

Inside CNET News

Scroll Left Scroll Right