• On The Insider: Susan Boyle Makes History with Album

The Pervasive Data Center

Read all 'multicore' posts in The Pervasive Data Center
November 6, 2009 6:00 AM PST

Intel's James Reinders on parallelism - Part 2

by Gordon Haff
  • Post a comment
Share

Intel's James Reinders is an expert on parallelism; his most recent book covered the C++ extensions for parallelism provided by Intel Threaded Building Blocks. He's also the Director of Marketing and Business for the company's Software Development Products. In Part 1 of our discussion at the Intel Developers Forum in September we talked about how to think about performance in a parallel programming environment, why such environments give developers headaches, and what can be done about it.

Here, in Part 2, we move on to cloud computing, functional and dynamic languages, and what needs to happen with computer science education.

Few wide-ranging conversations these days would be complete without at least a nod to cloud computing which Reinders views as very much connected to the matter of parallel programming.

Cloud computing is parallel programming. You're solving the same problem. In fact, someone that's good at decomposing a program to run in parallel on a multicore or on a supercomputer... the same thought process is necessary to decompose a problem in cloud computing. What's different in cloud computing is that the cost of a connection or a communication between two different clouds is so high. You really need to get it right. It works best when a little message is sent, does an enormous amount of computing, and gets a little message back.

Data parallelism tends to be very fine-grained.

Task parallelism like we see with Cilk and Threaded Building Blocks is a little bit more coarse.

Cloud computing has to be very very coarse-grained parallelism.

But there's something common about how you have to think about it.

The tools that will let people do cloud computing, express a problem in cloud computing, may eventually just map onto a multicore.

The granularity that Reinders discusses refers to how small a chunk of computing can be, given the cost and latency of communications. Within a single processor, communications bandwidth is high and latencies low, so software can afford to perform a relatively small task and then synchronize the results. (Although moving large amounts of data can still be relatively "expensive" which is why data parallelism can be finer-grained than task parallelism; see Part 1 for further background on data parallelism.)

By contrast, external communication networks have limited bandwidth and are relatively slow--on the order of four or five orders of magnitude slower than communications within a system. Therefore, tasks have to be parceled out in relatively large chunks that, ideally, don't have to be packaged up with a significant amount of local data.

Next up was education. Here, Reinders' basic message was focusing on the theory before diving into the implementation details. I suspect that this highlights one of the key challenges: Parallel programming tends to require a solid grasp of programming theory and doesn't lend itself particularly well to just "hacking around" in the absence of that grounding.

I've been doing a lot in the area of teaching parallelism. What a lot of people think of right away is teach them locks, teach them mutexes [algorithms to prevent the simultaneous use of a common resource], teach about how to create a thread, destroy a thread. That's all wrong. You want to be talking at a higher level. How do you decompose an algorithm? What is synchronization in general? Why does it exist?

Things I would hope undergraduates would learn are parsing theory, DAG representations [a tool used to represent common subexpressions in an optimizing compiler], database schemas, data structures, algorithms. All these are high level, not things like [the programming language] Java. Parallel programming's like that too. You get hands-on touching the synchronization method or whatever but you want to teach the higher level key concepts.

Some people it's going to be more in-tune with their thinking but you try and teach it to everyone.

Given that most of today's languages weren't expressly designed for parallel programming, discussions about parallelism often turn to new programming languages. This means functional languages most of all but can also involve dynamic or scripting languages which generally handle more low-level details under the covers than do Java or C++.

Functional languages don't lend themselves to easy, or easily comprehensible, description. A common shorthand is that "Functional programming is a style of programming that emphasizes the evaluation of expressions, rather than execution of commands." But that probably doesn't help much if you don't already know what it is. As for Wikipedia's entry, Tim Bray--no programming slouch--called it fairly impenetrable. (Perhaps you begin to see the problem.)

A couple of things I'm interested in functionals for. We don't wake up one day and everyone uses. It's sequential semantics again and sequential semantics appeal to people and functional languages don't have them. But some people eat them up.

And they solve amazing problems. You can code things up in them that are much easier to understand than if they are written in a traditional language although they can be cryptic or terse to a lot of programmers.

Erlang [a functional language] has gotten a bit more and more usage. Maybe it is creeping in. It's not going to take over the world overnight but it seems like the one that might stay around. May be talking about it 20 years from now and saying, yeah, Erlang's been around for 25 years. It might be accepted as a language. It may have legs.

But even Java. [Unlike Erlang,] It appealed to people who programmed in C and C++; it didn't challenge them to think differently. And because of the strict typing and stuff it helps [the enterprise developer] to deploy certain types of apps.

Python [a dynamic language] is interesting. It is so popular with a lot of scientists. It's on my short list of things, where if we can figure out where to partner or extend some of the things we're doing, Python's on my short list of languages that we want to help with parallelism. Maybe some of our Ct technology would apply there. We'll see if other people agree with us. Think the concepts we're talking about are pretty portable. 

Finally, we concluded our discussion with hardware.Are there opportunities at the hardware and firmware level with memory subsystems or with specific technologies such as transactional memory? Sun Microsystems was very interested in transactional memory in the context of its now canceled "Rock" microprocessor. The basic concept behind transactional memory is to provide an alternative to lock-based synchronization by handling concurrency problems as they occur at a low-level rather than having the programmer protect against them all the time.

The best solutions tend to not be silver bullets so much as incremental. Nehalem [Intel's latest microprocessor generation] in a way probably helped us more than  anything in recent memory because we moved to the QuikPath interconnect and moved bandwidths up and latencies down. Larrabee [a many-core Intel microprocessor still under development] may pave the way with some innovations in interconnects. I think there may be some refinements needed. Interconnecting the processors is a classic supercomputer issue.

Transactional memory has slammed up against a very tough reality which is that hardware always wants to be finite; software solutions wants to be infinite. Think there's something there.I think the people looking at transactional memory have started to make observations about locks that may end up being useful. It's funny. The mission of transactional memory is to get rid of locks but the more they looked at it the more they understood about how locks behave. There might actually be possibilities to make locks behave better in hardware.

Can we do the hardware a little differently? Not the sexiest thing in the world. But as we move from single-threaded to  multi-threaded what complications are we creating things [that the hardware can help with]?

Even if you don't subscribe to the more extreme views of programming and software being in a crisis because of the move to multi-core, we're clearly in a transition. New tools are needed and programmers will have to adapt as well, to at least some degree.

November 24, 2008 11:12 AM PST

Supercomputing wrap-up

by Gordon Haff
  • Post a comment
Share

At some point during the flight over the Pacific from Tokyo, I seriously questioned my decision to take a detour rather than heading straight to Boston and home. It wasn't that I had no interest in attending the Supercomputing show, SC08, being held in Austin last week. It's just that I was coming off of what was already a two-week trip to Japan. However, Supercomputing has been getting more and more buzz in recent years--and I hadn't been able to attend previously because of conflicts--so duty beckoned.

I was glad I made it. It was an immensely interesting and educational (albeit exhausting) couple of days. What follows are a few things that caught my eyes and ears. I plan to follow up on at least some of these in more depth when I have a chance.

Energy and attendees. First of all it's worth noting the general ambience of the show. It was hopping. Economic slump you say? One wouldn't know it from walking the exhibit floor or attending the sessions. To be sure, both booth and attendance commitments are often made well in advance. Nonetheless, I find it striking that SC08 set an attendance record--over 10,000 people--and that a lot of the exhibitors I spoke with were not only happy about the level of traffic to their booths and meetings, but were, in many cases, actually closing business. I found the general feel of the show to be at least somewhat reminiscent of a long-ago UniForum--albeit with more of an academic and application flavor.

InfiniBand is very much alive. I wrote after the October TechForum '08 event that "InfiniBand may not ever markedly expand on the sorts of roles that it plays. But 10 Gigabit Ethernet is far from ready to take over when latency has to be lowest and bandwidth has to be highest." The biggest of those roles is high-performance computing (HPC) and, indeed, InfiniBand was omnipresent at SC08. No particular surprise there but certainly lots of confirmation that InfiniBand is anything but dead. Also significant was QLogic's announcement at the show of an InfiniBand switch family. What's notable is that these switches use QLogic's own chips, rather than sourcing them from Mellanox as everyone else does. That QLogic made this design investment must count as a considerable vote of confidence in InfiniBand's future.

Clusters continue their advance. Supercomputers used to be largely bespoke hardware designs specifically constructed for HPC tasks. There's still some of that. IBM's Blue Gene is one example. A start-up, SiCortex, exhibiting at the show provides another. However, in the main, supercomputing continues to be more and more about clustering together many--mostly standard off-the-shelf--rackmount or blade servers rather than creating monolithic specialized systems. This isn't a new trend, but it continues apace (and is certainly one of the reasons that InfiniBand has been regaining visibility of late).

Microsoft makes modest gains. Microsoft made it into the top 10 of the (publicly acknowledged) largest supercomputers with the Dawning 500A at the Shanghai Supercomputer Center. There was still far more Linux--and, to a lesser degree, other flavors of Unix--at the show than Windows. But this example and others help to reinforce the notion that Microsoft products are technically capable of playing in HPC. That's not to say that Microsoft will easily insert itself into environments that are predisposed to and have in-house skills aligned with Unix tools and techniques. However, as HPC in commercial environments becomes increasingly common, it means that Microsoft has an opportunity there, where Windows typically already has a footprint.

Parallel programming is still a challenge. So much so that all-around computing guru David Patterson devoted his plenary session to the topic. That said, based on Patterson's session as well as the work of a variety of companies such as RapidMind and Pervasive Software, we may be starting to see at least the outlines of how developing for processors with many cores and for amalgams of many systems might progress. The issue is that parallel programming is hard and most people can't do it. One approach is training but we seem to be developing a consensus that neither this nor new programming tools (e.g., languages) really get to the heart of the matter. Rather, the general direction seems to be toward something you might call multicore virtualization--the abstraction of parallel complexities by carefully crafted algorithms and runtimes that handle most of the heavy lifting. (MapReduce is a good example of the sort of thing I'm talking about.)

Supercomputing and HPC used to be their own world. Increasingly they illuminate the future direction of all (or at least most) computing--including the challenges ahead. That's a big reason that I find Supercomputing such a fascinating show.

  • prev
  • 1
  • next
advertisement
Click Here

The yogurt makers of tech: Gadgets to avoid

Don't buy these one-trick ponies--unless you like gizmos that gather dust.

Google wants to unclog Net's DNS plumbing

The Net giant, ever eager for a faster Internet, debuts its Google Public DNS service. With it, Google could become even more central to the Net.

advertisement

About The Pervasive Data Center

This blog takes a deep (and often skeptical) look at trends big and small in the world of enterprise servers, data centers, and "Yotta-scale" computing. This means also taking into account the myriad of software, networks, and devices that are driving change in (or being driven by) these back-end systems. Stories posted to this blog may also appear on Illuminata's site.

Gordon Haff is a principal IT adviser for Illuminata of Nashua, N.H. Before becoming an IT industry analyst, Gordon held a variety of product-marketing positions at Data General, spanning more than a decade. He's programmed for DOS, Windows, and Linux; builds his own PCs; and holds engineering degrees from MIT and Dartmouth, with an MBA from Cornell. He is a member of the CNET Blog Network and is not an employee of CNET. Disclosure.

Add this feed to your online news reader

The Pervasive Data Center topics

Most Discussed



advertisement

Inside CNET News

Scroll Left Scroll Right