October 10, 2006 12:12 PM PDT
IBM's Power6 gets help with math, multimedia
That may not sound like anything special for a processor whose clock ticks at a rate approaching 5 billion times each second. But Power6 can count to 10--and perform numerous other mathematical operations--with the decimal digits 0 through 9 rather than the binary digits of 0 and 1 used by conventional computers.
"When we do multiplication on the chip, we can do it the same way you learned it in grade school," Brad McCredie, Power6's chief architect, said in an interview. McCredie also presented Power6 details at the Fall Processor Forum here Tuesday.
Binary math is the ordinary mode for Power6 and a natural for computers: The two digits can conveniently be represented by voltage differences and other yes-or-no, up-or-down, on-or-off differences. But humans, graced with 10 digits, generally opted for base 10, or decimal, mathematics, and about a little more than half of numeric stored in commercial databases is decimal, McCredie said.
But precision problems can crop up when computers translate numbers into binary to perform a calculation, then translate back to the decimal system to present answers. For example, 10 percent of $1.50 should be 15 cents, not 14.9999 cents, he said. Consequently, regulations require that some tax and government applications perform math using decimal-based calculations, McCredie said.
"There are a lot of software packages so people can run decimal math," he said, but performing the instructions in hardware speeds up processing by a factor of two to seven, he said. It's still slower than binary math, though; the chip can't do as much in a single clock cycle.
Power family competition
Power6, a dual-core chip IBM will begin manufacturing this year for servers going on sale in mid-2007, is the latest in a series of server processors that are central to Big Blue's recovery in the Unix server market. In terms of revenue, IBM reached the top spot in the market in 2005 over Hewlett-Packard and Sun Microsystems, though the company has given back some of those gains in the first half of 2006.
The Power family, which also includes lower-end PowerPC models, competes chiefly with Itanium chips from Intel, Sparc from Sun and Fujitsu, and x86 chips from Intel and Advanced Micro Devices.
The Power and PowerPC lines will grow one step closer together with Power6, which incorporates the AltiVec instruction set that speeds up many multimedia tasks. AltiVec, also known as VMX, increases efficiency by letting a single processing instruction be applied to multiple data elements. That's helpful for video and audio tasks on desktop machines, but servers will benefit as well in, for example, high-performance computing tasks such as genetic data processing, McCredie said.
Adding AltiVec was a tradeoff, he said. It's a valuable feature, but electrical current "leakage" problems in today's chipmaking technology mean that even idle parts of a chip consume power and produce waste heat.
Power6 will run at speeds of 4GHz to 5GHz, IBM has said. "It will be closer to 5GHz than it is to 4GHz," McCredie said.
To keep up with the faster clock speeds--about twice that of the current 2.3Ghz fastest Power5--IBM increased the Power6's communication abilities. Where Power5 can transfer data on and off the chip at a rate of 150 gigabytes per second, Power6 can do so at 300GBps, McCredie said.
IBM also has moved some higher-end reliability features from its mainframe line to Power6, he said. The idea is to catch and fix as many errors as possible before software has to be interrupted.
At each cycle, the chip records the state of all the data it's storing; if an error is detected, the chip can revert to its previous state to retry the processing step, McCredie said. If the error is more severe, the entire state of the processor can be moved to a new processor core, an ability called "CPU hot spare."
In addition, every data pathway is checked to make sure data isn't corrupted as it moves within the chip, he said.
Each Power6 chip has dual processing cores, and each core has 4MB of high-speed level-two cache memory to itself, compared with a 2MB shared cache in Power5. In addition, the two cores can share an optional 32MB of level-three cache separate from the chip, McCredie said.
Each core can simultaneously handle two instruction sequences, called "threads." The performance of the second thread is about 55 percent of the first on database transaction tasks, McCredie said, which is about double the performance of the second thread on Power5.
To improve virtualization abilities, Power6 can be subdivided into as many as 1,024 separate partitions, each with its own operating system. Customers aren't likely to want slivers that thin, though, he said. "I don't think we're going to deliver that to the customer. I think we're going to stop at 200 or so," McCredie said.
A Power6 chip can connect directly to three others in four-socket groupings using a first-tier communication fabric. And each of those groupings can connect directly with seven others over a second-tier communication fabric. The two-tier fabric keeps all the processors' cache memories synchronized.
14 commentsJoin the conversation! Add your comment