• On CBS MoneyWatch: 5 Things You Should Buy at Walmart
June 1, 2009 8:05 AM PDT

Multi-threading reviewed

by Gordon Haff
  • Font size
  • Print
  • 3 comments

I've been getting a fair number of questions about multi-threading the past couple of weeks. The reason is that Intel has been previewing its "Nehalem EX" Xeon processor in advance of Advanced Micro Device's six-core "Istanbul" CPU launch. Intel's Nehalem generation has simultaneous multi-threading (SMT)--which Intel calls Hyper-Threading (HT)--while Istanbul does not.

I wrote about this topic in depth a couple of years back in "Gradations of Threading," but it's worth reviewing in the context of these new server processors.

First, a little terminology.

A thread is a sequence of instructions that can execute in parallel with other threads. The details of what exactly constitutes a thread and the relationship between threads and other structures such as processes vary by operating system. However, for our purposes here, think of a thread as an independent task.

Simultaneous multi-threading.

(Credit: Illuminata)

A core is, in most respects, a complete processor that includes all the hardware such as execution units, registers, and so forth required to execute a sequence of instructions. Although multiple cores on a single die or in a single package (i.e. a chip or socket) may share certain resources such as cache memories, logically each core is a full central processing unit (CPU). That multiple cores are packaged together today is essentially an implementation detail that relates to getting the best performance out of the most economically sized silicon die.

Absent multithreading, each core can execute one thread at a time, running that thread until it has completed or until the operating system scheduler swaps it out for another thread.

SMT changes that 1:1 relationship. On a processor with SMT, more than one thread can execute on a single core at the same time--in the case of HT, it's two threads per core.

SMT potentially allows a processor to be more efficiently utilized. The reason is that modern microprocessors have multiple execution units within each core. For example, they have separate logic to handle integer operations and floating-point operations. Thus, in principle, if a thread with mostly integer operations runs concurrently with a thread that mostly crunches floating-point numbers, we could keep the processor busier by running both threads at the same time than we could running them sequentially.

The other main benefit is to hide memory latency. CPUs have to operate on data and that data has to ultimately come from memory or disk. Computer designs incorporate all sorts of techniques--such as caches and prefetching--to keep data close to processors in time and space. Nonetheless, processors still spend a lot of time waiting for data to arrive from relatively pokey memory. SMT lets a CPU quickly switch away from a thread that's sitting idle waiting for associated data to arrive.

SMT is therefore essentially a technique to use a processor more efficiently. It does not itself add execution resources to a core. And, in fact, the duplicated hardware and other logic that SMT requires to function (such as registers) takes space away from implementing other features (such as larger caches) that could themselves provide alternative ways to boost chip performance.

Intel's HT implementation--a fairly "lightweight" approach relative to IBM's on its Power processor--uses on the order of 5 percent of the total chip area to deliver typical performance gains of between 10 and 20 percent. (Optimized applications can see bigger gains. On the other hand, applications that are already efficiently using the CPU's execution units--or that are bottlenecked in ways that SMT can't assist with--may see no gain at all.)

Ultimately SMT is just one performance feature among many that may or may not be a match for a given processor's design. In Intel's case, it's been in some x86 designs but not others since it debuted on the Pentium 4; Itanium uses a simpler Temporal multi-threading approach.

SMT's in the plus column of the features checklist. But what really matters is overall processor performance on relevant workloads and platform capabilities. SMT is one tool to get there.

Gordon Haff is a principal IT adviser at Illuminata and has more than 20 years of IT industry experience. He writes about what's happening with enterprise servers and data centers, "Yotta-scale" computing, and related software and device trends as part of the CNET Blog Network. Disclosure.
Recent posts from The Pervasive Data Center
Five big business techs of the decade
Breaking the expensive computer mindset
EMC rolls out FAST
IT's successful standards
The rise of the cloud platform
How thin is thin in clients?
The new optimizations for capability computing
Observations from an EMC analyst day
Add a Comment (Log in or register) (3 Comments)
  • prev
  • 1
  • next
by GRobLewis June 1, 2009 10:17 AM PDT
A supercomputer startup called Tera attempted to build systems in the 1990s with, IIRC, 128 threads per core. The challenge, as always, was writing compilers to efficiently parallelize applications. But on hand-coded tests like integer sort algorithms, the Tera could really fly. (The company eventually absorbed Cray and dropped its multithreaded architecture in favor of Cray's, I believe.)
Reply to this comment
by ghaff June 1, 2009 10:26 AM PDT
More recently SiCortex was doing something along those lines but they recently shut down.
by Pishkado June 1, 2009 11:11 AM PDT
A chip has N transistors. Thanks to Moore's "law," N has become in the last five years too large to use them all effectively to make a fast single-thread processor core. As a result, today's processors have C cores that can execute T threads each. The chip designer challenge is to pick the optimum values of C and T, taking into account the eventual market (and therefore target cost) of each chip, which in turn depends on ... but this is a comment, not a dissertation.

One limiting case is C=1, T=maximum possible (one core with the highest possible degree of multithreading). Not cost-effective because processor resources, even with multiple execution units, become a bottleneck.

The other limiting case is T=1, C=maximum possible (lots of single-threaded cores). Not cost-effective because, as Haff discussed, some degree of multithreading helps more than it hurts.

Therefore, except at the very low end, we end up with C>1 and T>1. The exact values vary with the chip, the market, the designers' preferences and will both increase over time (to the also-increasing sound of programmers tearing their hair out - but that's also another topic).
Reply to this comment
(3 Comments)
  • prev
  • 1
  • next
advertisement

Five New Year's resolutions for Google

Stakes are high as Google attempts to maintain one of the Internet's greatest cash machines while pushing into new and risky markets.
• Android event set for Jan. 5

For eBay sellers, a holiday hamster hangover

The gift frenzy over Zhu Zhu Pets leaves some power sellers feeling like they've just run a marathon--but the steep price tags lead to some impressive profits.

About The Pervasive Data Center

This blog takes a deep (and often skeptical) look at trends big and small in the world of enterprise servers, data centers, and "Yotta-scale" computing. This means also taking into account the myriad of software, networks, and devices that are driving change in (or being driven by) these back-end systems. Stories posted to this blog may also appear on Illuminata's site.

Gordon Haff is a principal IT adviser for Illuminata of Nashua, N.H. Before becoming an IT industry analyst, Gordon held a variety of product-marketing positions at Data General, spanning more than a decade. He's programmed for DOS, Windows, and Linux; builds his own PCs; and holds engineering degrees from MIT and Dartmouth, with an MBA from Cornell. He is a member of the CNET Blog Network and is not an employee of CNET. Disclosure.

Add this feed to your online news reader

The Pervasive Data Center topics

advertisement
advertisement

Inside CNET News

Scroll Left Scroll Right