Speeds and Feeds

Read all 'Nehalem' posts in Speeds and Feeds
October 7, 2009 8:07 AM PDT

ATI and Nvidia face off--obliquely

by Peter Glaskowsky
  • 8 comments

Nvidia and Advanced Micro Devices' ATI division are taking different approaches to graphics processing in the next generations of their products. Both strategies have strengths and weaknesses, and I think it's too soon to pick the eventual winner in this long-running fight.

Before I get into my analysis, I should say that Nvidia paid me to write a white paper on the implications of its new GPU architecture (code-named Fermi) for high-performance computing applications. The white paper was released as part of the Fermi launch event at Nvidia's GPU Technology Conference last week.

Nvidia also paid for white papers from two other well-known microprocessor analysts, Nathan Brookwood of Insight64 and my friend and former colleague Tom Halfhill of Microprocessor Report. UC Berkeley professor David Patterson wrote a fourth white paper, and Nvidia wrote one of its own. All of these works take a different approach to the subject; all are worth reading if you need to understand what Fermi is all about.

In short, I think the Fermi architecture has been more thoroughly white-papered than any graphics chip design in history. All five of these documents are available on the Fermi home page on Nvidia's Web site, and just in case that page is moved or changed, you're welcome to take advantage of my own mirror of my white paper.

I've spent much of the last several days reading these documents plus David Kanter's excellent article on Fermi over on his Real World Technologies site. David managed to get some details on Fermi that Nvidia didn't give to the rest of us.

I've also had time to go through the coverage of ATI's recent launch of the RV870, which is what Nvidia's Fermi-based chips will be competing against. The first of Nvidia's chips bears the internal code name of GF100, and it's huge. Here's a life-size photo:

... Read more
September 28, 2009 4:46 PM PDT

Explaining Intel's Turbo Boost technology

by Peter Glaskowsky
  • 18 comments

Intel promotes the Turbo Boost technology in its new Core i7 Mobile processors as a way to adapt to the needs of the software and get more performance from the chip, but this isn't the real reason the technology exists.

The new "Clarksfield" Core i7 Mobile processors introduced at the Intel Developer Forum last week are certainly very impressive. They're huge high-performance quad-core chips with Hyper-Threading, support for two channels of DDR3-1333 DRAM, and an on-die PCI Express controller for the fastest possible connection to discrete graphics chips.

Mooly Eden and Core i7 Mobile processor

Intel VP Mooly Eden shows off the new Core i7 Mobile processor and its companion I/O controller at the Intel Developer Forum.

(Credit: Intel)

In his IDF session announcing these parts, Intel Vice President Mooly Eden said the best of these parts, the 2GHz Core i7-920XM Extreme Edition, is "the fastest quad-core processor, the fastest dual-core processor, and the fastest single-core processor"-- all in one chip.

The key to this dramatic claim is a feature called Turbo Boost technology. Basically, if the current application workload isn't keeping all four cores fully busy and pushing right up against the chip's TDP (Thermal Design Power) limit, Turbo Boost can increase the clock speed of each core individually to get more performance out of the chip.

It's easy to see how this works when just one or two cores are being actively used; whatever power the other two or three cores would have consumed can be redirected over to the active cores, allowing them to run at higher speeds.

The quad-core mode of Turbo Boost is a little more subtle; it works when the four cores aren't running a worst-case workload--for example, integer-heavy processing, since it's generally floating-point calculations that consume the most power--so they aren't bumping into the TDP limit. Turbo Boost can increase the frequency of all four cores until they're running as fast as they can for the current workload.

Eden said that the Turbo Boost controller ... Read more

September 28, 2009 8:01 AM PDT

Intel's Lynnfield mysteries solved

by Peter Glaskowsky
  • Post a comment

The mysteries of the Lynnfield and Jasper Forest die photos (from last week's post titled "Investigating Intel's Lynnfield mysteries") were all cleared up at the Intel Developer Forum last week, and as expected, there was nothing sinister going on--just some confusion in Intel's graphics arts department.

With the help of the always-helpful George Alfs of Intel's press relations department and Intel vice president Mooly Eden (general manager of Intel's PC Client Group), we got everything straightened out. Literally!

Here's the die photo of Intel's Lynnfield chip from my previous post:

Lynnfield die photo

Die photo of the Core i5/Core i7 processor code-named Lynnfield, with labels.

(Credit: Intel)

This is the newest (shipping) part based on the Nehalem microarchitecture, differing from the earlier Bloomfield by the addition of an on-die PCI Express controller. Both chips are made in Intel's 45nm process technology.

According to Eden, the Lynnfield chip design is shared with several other Intel chips that will be on the market soon, including ... Read more

September 21, 2009 6:30 AM PDT

Investigating Intel's Lynnfield mysteries

by Peter Glaskowsky
  • 1 comment

I have a few questions to ask at this week's Intel Developer Forum....

Why is Intel using a more expensive chip for the new Core i5 and cheaper Core i7 processors? Why does this new chip--code-named Lynnfield--appear to have features Intel isn't using? What's the connection between Lynnfield and a future Intel chip code-named Jasper Forest?

These questions arose as I've been getting ready for IDF by reviewing recent press releases and news stories about Intel's current and forthcoming products, and chatting with fellow analysts about what we're looking forward to seeing there.

The recent announcements of the Core i5 and new Core i7 processors seemed pretty straightforward. Consider Brooke Crothers' piece on CNET: "Out with the old: Intel makes Core 'i' chips cheap." As Crothers explains, the facts are simple: the new Core i7 800-series slots in under the existing 900-series and replaces some older parts. The Core i5 is a new line, clearly positioned below the Core i7. Features, performance, and prices are all lower. That's as it should be.

But in looking at the coverage on some enthusiast sites, a fact jumped out at me. The Lynnfield chip is 12.5 percent larger than the Bloomfield chip used in the higher-priced Core i7 900-series processors (296 square mm vs. 263 square mm), in spite of the fact that Lynnfield only has two memory interfaces and no QuickPath Interconnect (QPI) link.

The big difference between the chips is the addition of 16 lanes of PCI Express on Lynnfield, but that's only about 80 pins plus the control logic. The changes should have roughly canceled each other out. Maybe one chip would be a little bigger than the other, but not by this much.

... Read more
August 31, 2009 5:35 AM PDT

High-end server chips breaking records

by Peter Glaskowsky
  • 3 comments

How would you like a single-chip microprocessor with more than four times the performance (on some applications) of Intel's best Core i7?

Then consider that up to 32 of these chips can be directly connected to form a single server, achieving four times the built-in scalability of Intel's next-generation Nehalem-EX processor.

That's IBM's widely anticipated Power7, which it described at last week's Hot Chips conference. But if you're interested, you'd better be prepared to spend a lot more than four times as much per chip. IBM isn't talking about pricing, but large Power servers can cost more than $10,000 per processor.

IBM Power7 die photo

IBM's forthcoming Power7 server processor has eight cores, manages 32 threads, and includes 32MB of on-chip embedded DRAM cache. Power7 also has the highest levels of off-chip bandwidth ever achieved by a microprocessor.

(Credit: IBM)

What makes the Power7 so powerful? Each chip has eight cores, and each core supports four-way multithreading. There's 32MB of level-3 cache on the chip, made using embedded DRAM (eDRAM) cells. Most CPUs use SRAM for cache because it's generally easier to combine with high-performance logic, but DRAMs--with only one transistor per bit--offer compelling density advantages. IBM spent years developing a new kind of eDRAM that would work with SOI (silicon on insulator) manufacturing processes, and the Power7 is the most advanced product to use the new technology.

Interestingly, the Power7 cores run much more slowly than those in the Power6 processor, which I wrote about here in 2007 ("Live from Hot Chips 19: Session 1, IBM's Power6"). The Power6 was designed to run very fast using a long CPU pipeline in order to deliver the highest possible performance on each thread of execution.

Maybe that strategy didn't work out as well as IBM hoped, because the Power7 returns to a more traditional microarchitecture with a shorter pipeline and much lower clock rates--though IBM didn't say exactly what those rates would be.

IBM did, however, promise that the Power7 would be roughly four times as fast as the Power6, chip for chip. Since it has four times as many cores, each of the new slower-clocked cores must still deliver about as much performance as those in the previous generation.

Chip-level performance must always be matched by off-chip connections lest the incoming data or outgoing results be bottlenecked by a too-slow channel. Accordingly, the Power7 is equipped with eight I/O channels for DRAM, each of which connects to an off-chip buffering device that splits the channel into two 64-bit DRAM interfaces. All together, IBM says the Power7 has 180 GBps of DRAM interconnect that can sustain over 100 GBps of effective memory bandwidth.

There's another 50 GBps of peak I/O bandwidth and a staggering 360 GBps of peak bandwidth used to let each Power7 chip communicate with others. The DRAM connected to each chip is thus shared across larger systems.

Combining these figures, IBM says a single Power7 has 590 GBps of total off-chip bandwidth. This isn't the real number, since many of those bytes are used for error-correcting codes and other overhead, but it's still pretty impressive.

So is Power7's die size: 567 square millimeters for 1.2 billion transistors. That's nearly a square inch! IBM says that if the 32MB L3 cache had been manufactured using SRAM, the transistor count would have been 2.7 billion instead.

Still, Power7 wasn't the only high-end chip talked about at Hot Chips.

Rainbow Falls, a record for core count
Sun Microsystems was there to describe its forthcoming Rainbow Falls chip, which I assume will be marketed as the UltraSparc T3. The chip has 16 cores, each of which is reportedly able to manage 8 threads.

Sun's primary Rainbow Falls presentation focused on details of Rainbow Falls' internal and external interconnects; a second talk described the cryptographic coprocessors present in each of the chip's cores. These coprocessors--one for modular arithmetic (commonly used in public-key cryptography) and a cipher/hash unit to accelerate bulk ciphers like AES and secure hash algorithms--provide many times the performance of pure software implementations.

Fujitsu was also at Hot Chips to describe its eight-core, 2GHz Sparc64 VIIIfx processor, the latest in a long series of impressive designs from the company. Fujitsu quoted a peak performance figure of 128 GFLOPS (billions of floating-point operations per second) with a typical power consumption of just 58 watts. It did not, however, provide sustained performance or worst-case power consumption figures.

AMD, Intel vie for high-volume servers
Few of us will have direct exposure to the IBM, Sun, and Fujitsu chips. A pair of presentations from Advanced Micro Devices and Intel described products that will be much more widely available.

AMD launched its six-core Opteron processor code-named "Istanbul" earlier this year (see Brooke Crothers' coverage from June). Next year the company will begin shipping a new Opteron model currently code-named Magny-Cours (after a racetrack in France). Magny-Cours will consist of two Istanbul chips in a single package, with twice as many DRAM interfaces to support the new processor's increased performance.

AMD also teased the audience with another mention of a new processor core design that has been under development there for several years: "Bulldozer," which is now targeted at 32nm process technology. This new core will incorporate new x86 instruction-set extensions which will probably not be adopted by Intel (a strategy that reminds me of AMD's old 3DNow extensions).

But saving the best for last--best, that is, from the perspective of anticipated sales--Intel's talk on Nehalem-EX showed just how far Intel has been able to push the technology envelope for high-volume servers.

Nehalem-EX is an eight-core version of the existing quad-core Nehalem design. The new chip also has 24MB of L3 cache done in old-school SRAM. By my calculations, about 60 percent of the chip's 2.3 billion transistors are in this cache alone.

Nehalem provides four links to external DRAM buffer chips supporting two DDR3 DRAM interfaces each (much like the Power7 solution) and four QuickPath Interconnect links that provide direct "glueless" connections for up to eight-processor systems (64 cores, 128 threads). Intel is also working on an external Node Controller chip for systems with up to 2,048 Nehalem-EX processors.

The aggregate bandwidth numbers for Nehalem aren't as mind-boggling as those for Power7, but they're still far beyond anything available for PC-architecture servers today. Based on the presentation, I estimate Nehalem could boast over 85 GBps of peak memory bandwidth and 100 GBps of chip-to-chip bandwidth, some of which must be allocated to I/O.

I expect the raw number-crunching performance of the Nehalem-EX cores to be roughly on the same level as Power7's cores. The lower ratio of bandwidth to processing power for Nehalem-EX reflects a different design target, not a design shortfall--and most importantly, a much lower selling price. There will presumably be versions of Nehalem-EX priced similarly to existing Xeon MP products, which currently top out at $2,301 each in small volumes, but that's a very reasonable price to pay for the market's most advanced x86 server processor.

August 29, 2008 5:01 AM PDT

Boxx fills in for a failing SGI

by Peter Glaskowsky
  • 5 comments

I miss the old SGI. Silicon Graphics was widely regarded as the greatest computer company in Silicon Valley back in the 1990s. Sometimes forgotten--but not gone--SGI was one of our greatest success stories and one of our greatest tragedies.

Boxx Technologies logo (Credit: Boxx Technologies)

Apple may have had more revenue by virtue of shipping millions of small systems, but SGI's hardware spanned the range from video-game consoles (the Nintendo 64) to workstations to supercomputers. SGI's Unix-based operating system, IRIX, was one of the most sophisticated in the industry.

I used to lust over SGI machines. I'd obsess over lists of used SGI gear, looking for a great deal that would let me have my own IRIX box at home. In 2004, I finally bought an Octane with MXI graphics... but that was years after these machines were effectively obsolete, and I paid less than 0.5% (1/200th!) of the original retail price of the machine.

In the mid-to-late 1990s, SGI was not well managed, losing huge amounts of money because its leaders would not... Read more

August 5, 2008 1:30 AM PDT

Intel's Larrabee--more and less than meets the eye

by Peter Glaskowsky
  • 13 comments

Intel announced on Monday that it will be presenting a paper at Siggraph 2008 about its "many-core" Larrabee architecture, which will be the basis of future Intel graphics processors.

The paper itself, however, has already been published, and I was able to get a copy of it. (Unfortunately, as you'll see at that link, the paper is normally available only to members of the Association for Computing Machinery.)

Larrabee block diagram

Intel's Larrabee includes "many" cores, on-chip memory controllers, a wide ring bus for on-chip communications, and a small amount of graphics-specific logic.

(Credit: Intel)

The paper is a pretty thorough summary of Intel's motives for developing Larrabee and the major features of the new architecture. Basically, Larrabee is about using many simple x86 cores--more than you'd see in the central processor (CPU) of the system--to implement a graphics processor (GPU). This concept has received a lot of attention since Intel first started talking about it last year.

... Read more

September 18, 2007 4:54 PM PDT

IDF Fall 2007, part 5-- Penryn Inside

by Peter Glaskowsky
  • Post a comment

In a technical session following Pat Gelsinger's keynote, Intel Fellows Stephen Pawlowski and Ofri Wechsler described Penryn, the newest dual-core processor from Intel. Penryn is shipping to OEMs now, with a formal launch scheduled for November 12. The full details of Penryn are available elsewhere, so I'll just focus on some interesting points from the presentation.

Penryn has a "deep power-down" state called CC6 (I don't know what the acronym means). The state saves the core's architectural state into a special on-die memory. According to the presentation, the chip's lowest power consumption can only be achieved when both cores on the chip are in the CC6 state.

Penryn will also support "dynamic acceleration," in which one core of the chip can run faster if the other ... Read more

  • prev
  • 1
  • next
advertisement

15 sites that went kaput in 2009

Web sites launch all the time, but they also shut their doors. We highlight 15 that bit the dust this year.

Top 10 news stories of the decade

Let the debate begin: Was the iPhone more important than iTunes? Was anything bigger than Google finding a great business model? CNET offers its list of the 10 most important stories of the '00s.

About Speeds and Feeds

Silicon Valley-based computer architect and chip analyst Peter N. Glaskowsky attends a variety of industry conferences throughout the year to meet with industry thought leaders and dig into the future of computing technology. In Speeds and Feeds, he analyzes trends in system architecture and interface design, as well as market and political pressures surrounding those trends. He is a member of the CNET Blog Network and is not an employee of CNET. Disclosure.

Add this feed to your online news reader

Speeds and Feeds topics

Most Discussed

advertisement

Inside CNET News

Scroll Left Scroll Right