June 18, 2004 4:00 AM PDT

Supercomputer ranking method faces revision

An effort by PC makers to move beyond a single, simple speed measurement for showing computing performance is now being matched by a similar push in the world of supercomputers--and not everyone is applauding the change.

News.context

What's new:
An organizer of the Top500 supercomputer rankings has produced a broader test suite that measures multiple dimensions of a machine's performance.

Bottom line:
The government-sponsored test suite, called the HPC Challenge Benchmark, has pleased some supercomputer makers, such as Cray. But IBM, which is moving aggressively into the supercomputing market and is featured more prominently on the Top500, is more cautious.

More stories on this topic

An organizer of the Top500 supercomputer rankings has produced a broader test suite that measures multiple dimensions of a machine's performance. By comparison, a mathematical test called Linpack is currently used to rank systems on the Top500 list, which is released twice a year with much fanfare.

"For a long time it's been clear to all of us (that) we needed to have more than just Linpack," said Jack Dongarra, a University of Tennessee professor who helped create Linpack and who's now working on a suite of tests that go beyond pure number-crunching prowess. "No single number can reflect the overall performance of a machine."

In the world of desktop PCs, where increasing a chip's clock speed by 20 percent rarely yields a 20 percent overall system boost, a similar shift away from simple but potentially misleading measurements has already occurred. Chipmaker AMD in 2002 began discarding the gigahertz labeling system. And this spring, Intel made a similar move for its Pentium and Celeron chips.

The government-sponsored test suite for supercomputers, called the HPC Challenge Benchmark, has pleased some supercomputer makers, such as Cray. But IBM, which is moving aggressively into the supercomputing market and is featured more prominently on the Top500, is more cautious.

The new suite of seven tests won't replace Linpack as the Top500 yardstick, Dongarra said. For one thing, the decades-old Linpack permits historical comparisons in high-performance computing, or HPC, and for another, a system that can't get a high Linpack won't do well on other tests, he said.

The new tests grew out of a program the United States government launched after being spooked by a Japanese supercomputer called Earth Simulator, which has topped the Top500 since 2002. The program, funded by the Defense Advanced Research Projects Agency (DARPA), has awarded grants to IBM, Cray and Sun Microsystems to develop new supercomputer designs.

"It was done for DARPA and the National Science Foundation and the Department of Energy. They wanted something to measure the overall effectiveness of computers designed for the program, and they realized that Linpack was not good enough," Dongarra said.

The next Top500 list is scheduled for release Sunday as the International Supercomputer Conference begins in Heidelberg, Germany.

It's not the first time the benchmark suite idea has been raised. Erich Strohmaier, another Top500 organizer, endorsed a composite test in 2000 to supplement Linpack.

New tests, new fans
Some companies are eagerly promoting the new test suite--in particular, Cray. Five of Cray's X1 systems lead in one test, which measures memory transfer speed, in contrast to the company's comparatively unflattering presence on the Top500 list.

"Customers are always going to want to run their particular codes, but it gives a good understanding about how a system performs in different areas," said Stephen Sugiyama, a Cray marketing manager. "They've done a lot of work to pick a few characteristics about systems that matter to customers."

Of the Linpack-based Top500, Sugiyama said, "It's a nice census of very high-performance systems, but when it's used to rank systems, it's not necessarily a good ranking."

Cray has specialized for years in supercomputing. IBM, though, is trying to adapt its general-purpose business servers to the market, with substantial success with Unix servers and clusters of small Linux computers joined by a high-speed network. And Big Blue is more skeptical.

"Everyone understands Linpack for what it is and what it isn't. No one understands these additional benchmarks in terms of what they are and what they are not," said Dave Turek, leader of IBM's "Deep Computing" team. "The hazard is thinking that more benchmarks is more illumination. It might just generate more degrees of confusion."

Many tests represent extreme and potentially unusual computing challenges, and it's not yet clear how well they align with actual customer work, Turek said. IBM recommends customers try out their software before buying a system.

In addition, the benchmark is skewed to reflect the interests of specific government agencies, Turek said, alluding to intelligence organizations such as the FBI, CIA or the National Security Agency.

"Three-letter agencies that all have different kinds of views in terms of what they see as important--they have all stuck something in there to accommodate their kinds of needs," Turek said.

New dimensions
Linpack measures how fast a system can solve complicated algebraic calculations--a test that measures processor performance well but not other aspects of a supercomputer. For example, it doesn't address how fast data is transferred to or from memory or disk storage systems.

And though Linpack tests a type of math called "floating-point" calculations, which involve a continuous spectrum of numbers, it doesn't test "integer" operations, which involve whole numbers. Integer operations are used in problems such as processing genetic sequences.

The HPC Challenge Benchmark suite, in contrast, includes tests such as Stream, which measures how fast data can be transferred from memory to a processor; Ptrans, which measures how fast one processor in a supercomputer can communicate with another; b_eff, which measures the response time and data capacity of a network; and DGEMM, which multiplies one array of numbers, called a matrix, with another.

The benchmark software runs all the tests simultaneously, Dongarra said, so manufacturers won't be able to run just one test or another. However, because the tests measure different aspects of a system, it's not meaningful to wrap the seven results into a single composite score, he added.

Of the seven tests in the suite, only five are measured, and changes or additions are possible, according to the Web site.

Meanwhile, the Top500 isn't going away, despite its imperfections.

"It clearly has a place. It does attract a lot of attention in using this one number to rate machines. There are some bragging rights that go along with it," Dongarra said.

4 comments

Join the conversation!
Add your comment
Message has been deleted.
Posted by crypto k. colloid (2 comments )
Reply Link Flag
I wonder why...
I wonder why AMD and Intel (least of all, Intel) are always
credited with breaking away from the "MHz myth," when it was
Apple who fought tooth and nail to convince the market and
industry analysts that MHz isn't everything.

Then I remember that it's C|Net, and they'd discredit Apple even
if they discovered Cold Fusion. :/
Posted by olePigeon (39 comments )
Reply Link Flag
. I say let them develop and use the HPC Challenge Benchmark.
IBM suggests that no one understands the HPC Challenge Benchmark and suggests that the Linpack should remain the test of choice. There is also mention that the customer can just simply try their software out on the system (before they buy). That would be a little hard since a lot the systems are builded to order and are still on paper (IBM has been researching the Blue Gene/L since 1999 and is still being developed). I suppose that IBM could offer a refund if, once, they build the system and it does not perform in accordance with the Theoretical peak performance although I doubt it since IBM has poured $100 million into Blue Gene's development.

Additionally, since the government does funding for these large systems, they should be free to design tests to test the entire architecture before investing millions into any one program. Research facilities are in such a rush to have the fastest system and one of the main factors used in deciding on a particular product is based on the speed of current systems in the marketplace. So, the Top500 list becomes a player in the decision, and currently the test of choice is the Linpack. The Linpack test has been used to set the bar and to judge our latest and greatest systems. At some point we stopped thinking and became focused on developing machines that have favorable markings using the Linpack ratings. While this was happening, Japan developed the Earth Simulator. In short, while we were beating our chests were getting kicked in the tail. I say let them develop and use the HPC Challenge Benchmark.
Posted by (1 comment )
Reply Link Flag
Its amazing how guy things always get reduced to the same thing ...
After reading the story, it slowly dawned on me that all the fuss about the newer tests was getting away from the ubiquitous machismo male thing of 'bigger is better'. The Linpack test gives out a simple number, something that marketing guys can get both hands around - that they don't have to think about. Either yours is bigger or its not. It doesn't matter how much money is involved - it always comes down to the same juvenile purile nonsense.
Posted by steve_dupuis (6 comments )
Reply Link Flag
 

Join the conversation

Add your comment

The posting of advertisements, profanity, or personal attacks is prohibited. Click here to review our Terms of Use.

What's Hot

Discussions

Shared

RSS Feeds

Add headlines from CNET News to your homepage or feedreader.