June 18, 2004 4:00 AM PDT
Supercomputer ranking method faces revision
- Related Stories
Chinese supercomputer headed to top ranksJune 3, 2004
Microsoft creating Windows for supercomputersMay 24, 2004
Makers of white-box supercomputers hit their strideMay 10, 2004
New blood joins supercomputer rankingNovember 16, 2003
IBM expands high-performance computingApril 22, 2003
IBM dominates supercomputer rankingsNovember 3, 2000
An organizer of the Top500 supercomputer rankings has produced a broader test suite that measures multiple dimensions of a machine's performance.
The government-sponsored test suite, called the HPC Challenge Benchmark, has pleased some supercomputer makers, such as Cray. But IBM, which is moving aggressively into the supercomputing market and is featured more prominently on the Top500, is more cautious.
An organizer of the Top500 supercomputer rankings has produced a broader test suite that measures multiple dimensions of a machine's performance. By comparison, a mathematical test called Linpack is currently used to rank systems on the Top500 list, which is released twice a year with much fanfare.
"For a long time it's been clear to all of us (that) we needed to have more than just Linpack," said Jack Dongarra, a University of Tennessee professor who helped create Linpack and who's now working on a suite of tests that go beyond pure number-crunching prowess. "No single number can reflect the overall performance of a machine."
In the world of desktop PCs, where increasing a chip's clock speed by 20 percent rarely yields a 20 percent overall system boost, a similar shift away from simple but potentially misleading measurements has already occurred. Chipmaker AMD in 2002 began discarding the gigahertz labeling system. And this spring, Intel made a similar move for its Pentium and Celeron chips.
The government-sponsored test suite for supercomputers, called the HPC Challenge Benchmark, has pleased some supercomputer makers, such as Cray. But IBM, which is moving aggressively into the supercomputing market and is featured more prominently on the Top500, is more cautious.
The new suite of seven tests won't replace Linpack as the Top500 yardstick, Dongarra said. For one thing, the decades-old Linpack permits historical comparisons in high-performance computing, or HPC, and for another, a system that can't get a high Linpack won't do well on other tests, he said.
The new tests grew out of a program the United States government launched after being spooked by a Japanese supercomputer called Earth Simulator, which has topped the Top500 since 2002. The program, funded by the Defense Advanced Research Projects Agency (DARPA), has awarded grants to IBM, Cray and Sun Microsystems to develop new supercomputer designs.
"It was done for DARPA and the National Science Foundation and the Department of Energy. They wanted something to measure the overall effectiveness of computers designed for the program, and they realized that Linpack was not good enough," Dongarra said.
The next Top500 list is scheduled for release Sunday as the International Supercomputer Conference begins in Heidelberg, Germany.
It's not the first time the benchmark suite idea has been raised. Erich Strohmaier, another Top500 organizer, endorsed a composite test in 2000 to supplement Linpack.
New tests, new fans
Some companies are eagerly promoting the new test suite--in particular, Cray. Five of Cray's X1 systems lead in one test, which measures memory transfer speed, in contrast to the company's comparatively unflattering presence on the Top500 list.
"Customers are always going to want to run their particular codes, but it gives a good understanding about how a system performs in different areas," said Stephen Sugiyama, a Cray marketing manager. "They've done a lot of work to pick a few characteristics about systems that matter to customers."
Of the Linpack-based Top500, Sugiyama said, "It's a nice census of very high-performance systems, but when it's used to rank systems, it's not necessarily a good ranking."
Cray has specialized for years in supercomputing. IBM, though, is trying to adapt its general-purpose business servers to the market, with substantial success with Unix servers and clusters of small Linux computers joined by a high-speed network. And Big Blue is more skeptical.
"Everyone understands Linpack for what it is and what it isn't. No one understands these additional benchmarks in terms of what they are and what they are not," said Dave Turek, leader of IBM's "Deep Computing" team. "The hazard is thinking that more benchmarks is more illumination. It might just generate more degrees of confusion."
Many tests represent extreme and potentially unusual computing challenges, and it's not yet clear how well they align with actual customer work, Turek said. IBM recommends customers try out their software before buying a system.
In addition, the benchmark is skewed to reflect the interests of specific government agencies, Turek said, alluding to intelligence organizations such as the FBI, CIA or the National Security Agency.
"Three-letter agencies that all have different kinds of views in terms of what they see as important--they have all stuck something in there to accommodate their kinds of needs," Turek said.
Linpack measures how fast a system can solve complicated algebraic calculations--a test that measures processor performance well but not other aspects of a supercomputer. For example, it doesn't address how fast data is transferred to or from memory or disk storage systems.
And though Linpack tests a type of math called "floating-point" calculations, which involve a continuous spectrum of numbers, it doesn't test "integer" operations, which involve whole numbers. Integer operations are used in problems such as processing genetic sequences.
The HPC Challenge Benchmark suite, in contrast, includes tests such as Stream, which measures how fast data can be transferred from memory to a processor; Ptrans, which measures how fast one processor in a supercomputer can communicate with another; b_eff, which measures the response time and data capacity of a network; and DGEMM, which multiplies one array of numbers, called a matrix, with another.
The benchmark software runs all the tests simultaneously, Dongarra said, so manufacturers won't be able to run just one test or another. However, because the tests measure different aspects of a system, it's not meaningful to wrap the seven results into a single composite score, he added.
Of the seven tests in the suite, only five are measured, and changes or additions are possible, according to the Web site.
Meanwhile, the Top500 isn't going away, despite its imperfections.
"It clearly has a place. It does attract a lot of attention in using this one number to rate machines. There are some bragging rights that go along with it," Dongarra said.
4 commentsJoin the conversation! Add your comment