China has muscled into the No. 2 spot on the list of the world's fastest supercomputers thanks, in part, to specialized Nvidia graphics chips: a technology that Intel is now pursuing to keep pace with this new trend in high-performance computing.
China's Nebulae supercomputer is located at the recently constructed National Supercomputing Centre in Shenzhen, and achieved 1.271 petaflops/s (1.271 quadrillion floating point operations per second) running the Linpack benchmark, which put it in the No. 2 spot on the widely reported Top500 list. The latest list was formally presented Monday at the International Supercomputing Conference in Hamburg, Germany. (Jaguar, a Cray system at the Oak Ridge National Laboratory in Tennessee, retained the top spot.)
Nebulae achieved this "in part due to its Nvidia GPU (graphics processing unit) accelerators...Nebulae reports an impressive theoretical peak capability of almost 3 petaflop/s--the highest ever on the TOP500," according to a press release Friday.
Though Nebulae also uses Intel Xeon processors, those are so-called commodity processors that are also employed in standard server computers. So, Intel--despite canceling its Larrabee graphics chip project--is pursuing a technology that leverages Larrabee R&D. On Monday, Intel said the first product of this kind, code-named Knights Corner, will be made on its future 22-nanometer manufacturing process--using transistor structures as small as 22 billionths of a meter--to pack more than 50 processing cores on a single chip.
On Tuesday, I spoke with Jack Dongarra, Distinguished Professor at University of Tennessee's Department of Electrical Engineering and Computer Science and director of the Innovative Computing Laboratory. Dongarra introduced the LINPACK Benchmark, which is used as the primary yardstick to measure supercomputer performance.
Q: Are GPU accelerators in supercomputers a trend we'll see more of in coming years?
Jack Dongarra: This looks like this is going to be one of the modes of high-performance computing. Taking commodity processors (such as standard Intel or AMD server-class processors) together with specialized accelerators, in this case graphics processors.
How much do GPUs generally boost performance?
Dongarra: A board by Nvidia can give an order of magnitude greater performance than the commodity processor.
But programs must be written to take advantage of this, it just doesn't happen, correct?
Dongarra: There's nothing automatic about it. You have to write a program that explicitly passes information to the GPU and tells the GPU what to do. That can be easy or hard. In most cases it becomes a challenge to write an efficient program to do the operations. Part of the issue there is that the connection between the commodity part of the computer and the graphics processor is a very thin pipe. So, you have to pass information and think of a very thin straw through which you're passing a lot of information. And once you move it over there, you have to do a lot of operations to gain back any benefit.
And what's the future hold for GPU supercomputing?
Dongarra: Two things will happen. One, the connection will improve slightly. And then ultimately what's going to happen is that the graphics processor is going to be integrated into the commodity processor. So, you'll have a chip that has both the commodity processor's cores plus the graphics processors or an accelerator for doing floating-point arithmetic embedded into the chip itself. It's a path a number of companies are pursuing. Intel is one. AMD is another. Companies would like to pursue that path because it does provide the best performance but it does require another ratchet up in chip design.
Dongarra added that chips have been designed in the past with accelerators, though, of course, the chip-manufacturing technology at the time yielded different results. "There were companies that made these things that attached to mainframes," he said, citing Floating Point Systems, a company founded in 1970.