Intel says to prepare for 'thousands of cores'
Intel is telling software developers to start thinking about not just tens but thousands of processing cores.

Intel Tera-scale multicore research
(Credit: Intel)Intel currently offers quad-core processors and is expected to bring out a Nehalem processor in the fourth quarter that uses as many as eight cores.
But the chipmaker is now thinking well beyond the traditional processor in a PC or server. Jerry Bautista, the co-director of the Tera-scale Computing Research Program at Intel, recently said that in a graphics-intensive environment the more cores Intel can build the better. "The more cores we have the better. Provided that we can supply memory bandwidth to the device."
On Monday, an Intel engineer took this a step further. Writing in a blog, Anwar Ghuloum, a principal engineer with Intel's Microprocessor Technology Lab, said: "Ultimately, the advice I'll offer is that...developers should start thinking about tens, hundreds, and thousands of cores now."
He said that Intel faces a challenge in "explaining how to tap into this performance." He continues: "Sometimes, the developers are trying to do the minimal amount of work they need to do to tap dual- and quad-core performance...I suppose this was the branch most discussions took a couple of years ago."
Now, however, Intel is increasingly "discussing how to scale performance to core counts that we aren't yet shipping...Dozens, hundreds, and even thousands of cores are not unusual design points around which the conversations meander," he said.
He says that the more radical programming path to tap into many processing cores "presents the 'opportunity' for a major refactoring of their code base, including changes in languages, libraries, and engineering methodologies and conventions they've adhered to for (often) most of the their software's existence."
"Eventually, developers realize that the end point is on the other side of a mountain of silicon innovations...Program for as many cores as possible, even if it is more cores than are currently in shipping products."
Brooke Crothers is a former editor at large at CNET News.com, and has been an editor for the Asian weekly version of the Wall Street Journal. He writes for the CNET Blog Network, and is not a current employee of CNET. Contact him at mbcrothers@gmail.com. Disclosure.






My point is this: by the time Intel has ramped up to dozens of cores, we may well be on to quantum computing or some other totally different paradigm.
I do agree we need to plan for highly scalable, secure, distributed systems, but there's a huge risk if we develop for a specific mindset, and the mindset changes before it's implemented.
1) First, I am more optimistic than "most desktop apps are not inherently parallel". The good news is that for the compute-intensive portion of an app - which are often the bottleneck and impact user experience - there's plenty of parallelism available. Think about a ray-tracing app, or the meshing algorithm for a finite element package, the image processing tasks in a photo editing app, etc etc. Of all the FOR loops out there, a large portion can take place in parallel, without needing info from a previous iteration. So I would argue that there's plenty of parallelism that - if exposed - can materially impact user experiences.
Here's one blog post related to this topic: http://www.cilk.com/multicore-blog/bid/5365/What-the-is-Parallelism-Anyhow
2) Second, I am less optimistic about functionally decomposing an application ("However, each chip can be dedicated to certain tasks"). Manually programming the threads can be pretty complicated, may not balance well, and may need to be re-written for each microprocessor generation.
Here's a post addressing this topic in more detail: http://www.cilk.com/multicore-blog/bid/5847/The-Folly-Of-Do-It-Yourself-Multithreading
Additionally Amdahl's Law has been largely regarded as a software/application problem and not a hardware issue. Applications that are not parallelizable you would simply give them dedicated computing resources. If you have a sufficient number of cores that is totally possible. In effect this is a solution/compromise to Amdahl's law.
However, as in all processes that stray from static naming -- there will be a decrease in overall performance. As such, the developer needs to be aware of the trade off and program accordingly.
Sounds like this is more likely to be Intel's continued route towards it's own graphics solution.
Delivering the memory bandwidth for this "properly", is going to be difficult and costly.
If we are speaking in terms of general purpose processing, my thoughts would go towards seeing instruction and memory latency issues reduced first. This could boost performance well over 300%, a massive change.
Marketing statements like this help fend off the threat, while causing relatively little collateral damage to its current best-selling products, where Intel is pretty competitive right now. Don't worry, Intel will keep pushing single-threaded performance, latency improvements, etc. AMD and now Via will make sure they do, even ignoring the low-power market for now. They just want it known that they plan on addressing the highly-scalable market too (graphics and HPC), and in a big way.
for (@array) { do_someting_intensive }
or even better, LISP's "map" construct goes very will into scatter/gather mode, scaling up to as many processes as needed.
The key is to program functionally and not use alternate data paths (side effects) for result return.
Games are even more parallel, most IA should be parallel, FPS has several characters to handle, strategy games could use batallion level IA, and so on... Also in the future, with multiple cores, the games could try to handle more of the surrounding world than just what the player can see, it would make for more realistic environment.
--
Jedaļ
Anyway, I see big OS changes to harness a many-core world; I do not see a rewrite of the world's applications for massive parallelism in my lifetime.
Parallel Computing: Why the Future Is Non-Algorithmic
http://rebelscience.blogspot.com/2008/05/parallel-computing-why-future-is-non.html
To those invoking Amdahl's law, you should look up an equivalent formulation of the same equation caled Gustafson's law. The biggest misunderstanding people have of Amdahl's law is in assuming that the percentage of non-parallelizable instructions stays constant as the problem size grows. The theoretical maximum useful number of processors depends on the problem itself. Not having access to the full number of processors due to an arbitrary limit (ex: 32) can cause PARALLELIZABLE programs to become useless on a machine with an insufficient number of processors.
Even if we stay at a small number of cores (are distributed among many systems for geographic reasons), the code that runs on them must eventually be compiler checkable. I see nothing other than message passing concurrency working in the long run (erlang); lock based programming will never fully rise to this challenge because it creates too much coupling among components in the form of locking constraints and violates encapsulation in obscene ways.
Amdahl's law applies to people doing tasks as well. Imagine a scenario in which a hundred people can equal the efforts of an entire country for many important tasks, but the manpower is just not available for the work that CAN be done in parallel. The work that is parallel will swamp the 100 workers for the small group, while the large country reduces the costs of the parallel workload to its theoretical minimum with its extra workers.
The future great programmers are those that can reduce a program to its theoretical minimum given an N processor machine, and figure out an optimal number of processors to implement on in terms of cost/performance. Big-O notation will be revised to include parallelization complexity along with space and serial time complexity.
<a href http://kashi.webhop.net/blog/Technology/index.php/archives/22> http://kashi.webhop.net/blog/Technology/index.php/archives/22
Till we solve the concurrency issues the increased number of core will not achieve any higher performance.
The C++ is just starting to deal with concurrency. C++0x adds threads support however, that is just the starting point.
If and when it's finally achieved computers by then would make today's computers look like UNIVACs and ENIACs.
1) First, I am more optimistic than "most desktop apps are not inherently parallel". The good news is that for the compute-intensive portion of an app - which are often the bottleneck and impact user experience - there's plenty of parallelism available. Think about a ray-tracing app, or the meshing algorithm for a finite element package, the image processing tasks in a photo editing app, etc etc. Of all the FOR loops out there, a large portion can take place in parallel, without needing info from a previous iteration. So I would argue that there's plenty of parallelism that - if exposed - can materially impact user experiences.
Here's one blog post related to this topic: http://www.cilk.com/multicore-blog/bid/5365/What-the-is-Parallelism-Anyhow
2) Second, I am less optimistic about functionally decomposing an application ("However, each chip can be dedicated to certain tasks"). Manually programming the threads can be pretty complicated, may not balance well, and may need to be re-written for each microprocessor generation.
Here's a post addressing this topic in more detail: http://www.cilk.com/multicore-blog/bid/5847/The-Folly-Of-Do-It-Yourself-Multithreading
-
by dtremaine
November 25, 2008 8:17 AM PST
- Azul Systems in Mountain View currently has a system in production that has 864 cores.
-
Reply to this comment
-
(26 Comments)Don