SAN FRANCISCO--As the Intel Developer Forum gets under way this week, one hardly unexpected theme of CEO Paul Otellini's keynote address was that Moore's Law continues. Ivy Bridge, Intel's upcoming 22-nanometer processor platform, is slated for 2012. This continuation of Moore's Law means that a given area of silicon will contain more transistors.
Until relatively recently, more transistors more or less mapped directly to faster processor performance. That's because the additional transistors were primarily used to boost processor frequency and increase fast local memory--changes that were largely invisible to software. However, beginning around the middle of last decade, increasing performance this way started to hit a wall. Instead, chipmakers started to put the transistors toward more cores--essentially additional complete, independent processors from the perspective of software.
The problem is that a program designed to execute a single thread of instructions--e.g., add A to B then add the result to C--doesn't get any faster if you just add hardware to do more work independently. To reduce the time it takes the program to accomplish a given task, it has to split it up into pieces that the processor can work on in parallel.
This is easier said than done and was a matter of some concern in about 2006 as it was becoming clear that even client systems were headed to quad-core and beyond and the software wasn't really ready. As I wrote at the time, Kirk Skaugen, then-GM of Intel's Server Platforms Group, described that there is "work to be done there" and that "[only] dozens of applications have been threaded today."
So what happened? Was the problem solved or did it go away for some reason?
The most straightforward explanation would of course be that the necessary tools and languages to develop highly threaded applications matured, programmers became better trained in writing such applications, and major vendors did what was needed to adapt their software to the new hardware reality. Intel's James Reinders discussed with me some of the challenges and solutions here and here.
I would argue that some of this has happened. Perhaps most importantly, the (relatively few) applications capable of really taxing a modern desktop system--things like video rendering, 3D modeling, and high-end gaming engines--have mostly been updated to better leverage modern client architectures. This includes not only multi-core x86 processors but also 3D graphics processors that can augment general-purpose CPUs for certain tasks.
Add to this the fact that a typical client system is also running lots of secondary tasks such as backups, security scans, or playing music files. The result is that most apps that really need four cores are reasonably parallel and the rest usually run fast enough. When we wait on a desktop system it's likely to be storage or networking that's the bottleneck--or a CPU-intensive video-editing application that's fully using all the cores anyway.
However, as far as client systems are concerned, the multi-core programming problem has also in some sense been bypassed. Much processing has moved out to server farms on the Internet--in the cloud if you would. The client device, whether a desktop system, a notebook, a tablet, or a smartphone is increasingly more about interacting than it is about processing.
The processors for these client devices need transistor density too but they use transistors for many more ends than raw performance. These processors prioritize attributes such as small size, low power, and high levels of integration far more than they do total throughput.
Of course, parallel programming is still needed in such a cloud world. Indeed, it's needed more than ever. However, it's a different kind of problem there. It can be tackled with a much broader and more diverse set of tools and without many of the constraints associated with running an application on a traditional client PC.