• On GameSpot: Nintendo's Shigeru Miyamoto speaks out
July 1, 2008 9:50 AM PDT

Intel says to prepare for 'thousands of cores'

by Brooke Crothers

Intel is telling software developers to start thinking about not just tens but thousands of processing cores.

Intel Tera-scale multicore research

Intel Tera-scale multicore research

(Credit: Intel)

Intel currently offers quad-core processors and is expected to bring out a Nehalem processor in the fourth quarter that uses as many as eight cores.

But the chipmaker is now thinking well beyond the traditional processor in a PC or server. Jerry Bautista, the co-director of the Tera-scale Computing Research Program at Intel, recently said that in a graphics-intensive environment the more cores Intel can build the better. "The more cores we have the better. Provided that we can supply memory bandwidth to the device."

On Monday, an Intel engineer took this a step further. Writing in a blog, Anwar Ghuloum, a principal engineer with Intel's Microprocessor Technology Lab, said: "Ultimately, the advice I'll offer is that...developers should start thinking about tens, hundreds, and thousands of cores now."

He said that Intel faces a challenge in "explaining how to tap into this performance." He continues: "Sometimes, the developers are trying to do the minimal amount of work they need to do to tap dual- and quad-core performance...I suppose this was the branch most discussions took a couple of years ago."

Now, however, Intel is increasingly "discussing how to scale performance to core counts that we aren't yet shipping...Dozens, hundreds, and even thousands of cores are not unusual design points around which the conversations meander," he said.

He says that the more radical programming path to tap into many processing cores "presents the 'opportunity' for a major refactoring of their code base, including changes in languages, libraries, and engineering methodologies and conventions they've adhered to for (often) most of the their software's existence."

"Eventually, developers realize that the end point is on the other side of a mountain of silicon innovations...Program for as many cores as possible, even if it is more cores than are currently in shipping products."

Brooke Crothers is a former editor at large at CNET News.com, and has been an editor for the Asian weekly version of the Wall Street Journal. He writes for the CNET Blog Network, and is not a current employee of CNET. Contact him at mbcrothers@gmail.com. Disclosure.
Recent posts from Nanotech - The Circuits Blog
Apple MacBook Air: Cooler graphics
Hard disk or solid-state? Think again
Analyst: Thin laptops have design issues
Samsung breaks Netbook mold with Nvidia chip
Is Apple's Mac Mini a MacBook inside?
Conan O'Brien ribs 'nerds' at Intel science fair
Brouhaha over Intel branding
Apple iPhone 3GS: The sum ($) of its parts
Add a Comment (Log in or register) (26 Comments)
  • prev
  • 1
  • next
by bluemist9999 July 1, 2008 10:24 AM PDT
Why not program for quantum-phase computers as well?

My point is this: by the time Intel has ramped up to dozens of cores, we may well be on to quantum computing or some other totally different paradigm.

I do agree we need to plan for highly scalable, secure, distributed systems, but there's a huge risk if we develop for a specific mindset, and the mindset changes before it's implemented.
Reply to this comment
by cnetcensorssuck July 1, 2008 12:08 PM PDT
Wrong.
by The_Decider July 1, 2008 10:43 AM PDT
IMO, the future of desktop chips is not larger and larger multicore systems but 2 or 4 chip processors that improve in other areas like cache, while the OS manages the chips in a more intelligent manner. The reasoning is that most desktop apps are not inherently parallel, you simply can't get them to run in parallel, or at least not very often. However, each chip can be dedicated to certain tasks, 1 for the OS, and 1 for the apps in the case of a dual core setup. In a quad core system, 1 for the OS, 2 for apps and one for AV and AS(windows only). This will also relieve the not trivial problem of managing data in 4 L1's caches, L2 and main memory. There is a lot of complexity and overhead inherent in having to do this. Hardware companies are constantly trying to get faster chips with more cores without stopping to think 'why are we doing this?' and 'Is this of any real benefit'. There is no need for it on the desktop, and the software needs to dictate hardware, hardware can not dictate software. Progress for the sake of progress has given birth to craptastically horrible software design that uses the fact that hardware gets faster and/or more efficient over time to hide its sloppy design and programming. Microsoft is only one smelly piece of rubbish in a huge pile of refuse that is the software landscape. It has also help spread this toxic and odd idea of spending resources parallelizing software that might only be 10-15% parallel at best. That is a massive waste of resources with no real benefit to the end user. Of course, once you leave the desktop there is a need for chips that can handle the hardware complexity associated with parallelism and that is where real benefit is seen.
Reply to this comment
by ilya_cilk July 10, 2008 7:25 AM PDT
I see a couple things a bit differently:

1) First, I am more optimistic than "most desktop apps are not inherently parallel". The good news is that for the compute-intensive portion of an app - which are often the bottleneck and impact user experience - there's plenty of parallelism available. Think about a ray-tracing app, or the meshing algorithm for a finite element package, the image processing tasks in a photo editing app, etc etc. Of all the FOR loops out there, a large portion can take place in parallel, without needing info from a previous iteration. So I would argue that there's plenty of parallelism that - if exposed - can materially impact user experiences.

Here's one blog post related to this topic: http://www.cilk.com/multicore-blog/bid/5365/What-the-is-Parallelism-Anyhow

2) Second, I am less optimistic about functionally decomposing an application ("However, each chip can be dedicated to certain tasks"). Manually programming the threads can be pretty complicated, may not balance well, and may need to be re-written for each microprocessor generation.

Here's a post addressing this topic in more detail: http://www.cilk.com/multicore-blog/bid/5847/The-Folly-Of-Do-It-Yourself-Multithreading
by rapier1 July 1, 2008 10:48 AM PDT
Having more cores is wonderful assuming that you have a problem that is inherently and largely parallel in the first place. However, many problems are not inherently parallel - or the weighted portion of the application that is parallelizable diminishes significantly by the time you deal with all the attendant support code. I really suggest people look up Amdahl's Law. Not only do we need more cores, we will continue to need faster cores to improve computing power.
Reply to this comment
by rrod182 July 1, 2008 12:27 PM PDT
Software does not need to be designed for parallel processing for the overall system to benefit from multiple cores. As long as the OS can support multiprocessor scheduling, you will greatly benefit. In fact the more cores the better because the system does not have to rely on better programming for better performance. Better scheduling and more cores will prevent poorly designed or planned apps from starving the entire system of processing resources.

Additionally Amdahl's Law has been largely regarded as a software/application problem and not a hardware issue. Applications that are not parallelizable you would simply give them dedicated computing resources. If you have a sufficient number of cores that is totally possible. In effect this is a solution/compromise to Amdahl's law.
by The_Decider July 1, 2008 9:10 PM PDT
Nope, we need software developers that can write efficient code. That will increase computing power far more than adding more and more cores to run the same crappy programs. All the "programmers" that learned from a 'for dummies' book don't have the first clue how to write efficient software because they don't understand hardware and the relationship between the two. Even with software written in higher level languages like Python, Java, and heaven help us, C# benefits greatly if the programmer understands what is really going on behind the scenes. Adding more and more cores puts a tremendous burden on compiler writers who already have enough headaches due to the current architecture of extremely deep pipelines with multiple cores, with the associated dedicated L1 and shared L2 caches. Don't invoke Amdahl's law without really understanding it. It is one of the most misunderstood and incorrectly used laws in CS.
by PhattieM July 1, 2008 11:08 AM PDT
I think the idea Intel is trying to convey, is that programmers should begin thinking about their software as a more abstract entity. This includes writing the code so that it is scalable to any amount of processors.

However, as in all processes that stray from static naming -- there will be a decrease in overall performance. As such, the developer needs to be aware of the trade off and program accordingly.
Reply to this comment
by ZeZu420 July 1, 2008 11:18 AM PDT
More cores, in a graphic intensive environment ...
Sounds like this is more likely to be Intel's continued route towards it's own graphics solution.
Delivering the memory bandwidth for this "properly", is going to be difficult and costly.

If we are speaking in terms of general purpose processing, my thoughts would go towards seeing instruction and memory latency issues reduced first. This could boost performance well over 300%, a massive change.
Reply to this comment
by rama2008 July 1, 2008 11:28 AM PDT
This is good news for graphic intensive desktops especially for Real Time Ray Tracing engines. 3D Graphics based on Ray tracing techniques inherently posses parallelism and Ray tracing is far more superior than traditional graphics hardware based rendering. Multi core desktops will allow completely replacing hardware graphics cards with superior real time 3D Ray tracing software engines that can take advantage of multi core systems. I am sure Intel might already be doing research in this area. If real time ray tracing can be brought to the consumer desktops, every thing changes in the way 3D games being developed and the user experience.
Reply to this comment
by Mr Smith8 July 1, 2008 11:52 AM PDT
Use the ASCI Red design! The original incarnation of this machine used 9000+ Intel Pentium Pro processors, each clocked at 200 MHz. The original ASCI Red was the first computer on Earth to rate above 1 teraFLOPS on the MP-Linpack benchmark (1996).
Reply to this comment
by CompEng July 1, 2008 12:39 PM PDT
Most of us here are missing the point here. The point is that Intel doesn't see sufficient growth opportunity in focusing solely on the incremental performance improvements that have been its traditional bread and butter. Meanwhile, it's seeing a huge threat from the GPGPU projects from DAAMIT/Nvidia and the increasing dollar share of the GPU in the desktop computer.
Marketing statements like this help fend off the threat, while causing relatively little collateral damage to its current best-selling products, where Intel is pretty competitive right now. Don't worry, Intel will keep pushing single-threaded performance, latency improvements, etc. AMD and now Via will make sure they do, even ignoring the low-power market for now. They just want it known that they plan on addressing the highly-scalable market too (graphics and HPC), and in a big way.
Reply to this comment
by nachurboy July 1, 2008 12:54 PM PDT
How would you parallelize serial computation? Parallel computing only works for computations or processes that are discrete, which is why you don't see as much day-to-day applications using parallel coding techniques. For example, a word processor may have 2-3 threads running at a time, one to handle UI and input, one to handle data, and one to handle background tasks such as spell checking. What else would you need to do in a separate thread? How would you break those three threads down into more threads? A game is another example where parallelism isn't transferable. I can see Intel selling massively multi-core systems for real-time ray-tracing, but I would think it'd be better using a specialized GPU to do that than coding a ray-tracing library to work on multicore processors, even though it's probably the best use of multi-core. Bottomline, I think it's Intel's marketing spin to get people away from specialized processors to all general processors, just because they have no other way to increase performance anymore.
Reply to this comment
by toorregnig July 2, 2008 3:55 PM PDT
use a language that inherently separates the flow into work units. For instance a massively paralllizable Perl could launch an independent thread for each of the loop bodies in

for (@array) { do_someting_intensive }

or even better, LISP's "map" construct goes very will into scatter/gather mode, scaling up to as many processes as needed.

The key is to program functionally and not use alternate data paths (side effects) for result return.
by Jedai-2 July 3, 2008 12:43 AM PDT
You're limited by your thread oriented mindset, there's plenty of opportunity for parallelism that your average word-processor isn't exploiting. Taking the exemple of spell checking a whole document, in modern languages it would be easy to simultaneously spell check big chunks of the document, creating hundreds of light user threads automatically distributed on dozens of cores.

Games are even more parallel, most IA should be parallel, FPS has several characters to handle, strategy games could use batallion level IA, and so on... Also in the future, with multiple cores, the games could try to handle more of the surrounding world than just what the player can see, it would make for more realistic environment.

--
Jedaļ
by Catalina588 July 1, 2008 2:05 PM PDT
For fifty years, operating systems developers have been using more and more sophisticated code to run more tasks (e.g., threads) more efficiently on more cores. One likely change in a world of 1,000+ cores is a total rethink of OS architecture. Maybe we end up with something like the IBM System i (AS/400) where the OS and a thread/task/program have their own virtual memory mapped core. And individual OS services get their cores as needed.

Anyway, I see big OS changes to harness a many-core world; I do not see a rewrite of the world's applications for massive parallelism in my lifetime.
Reply to this comment
by bobby_brady July 1, 2008 3:36 PM PDT
Oh I can't wait! With all the viruses and spywhere there is out there infecting computers, multicore chips will do wonders to make computers faster. Each core can have it's own malware program running while viruses take over a couple dozen. Spywhere can infect the remaining cores.
Reply to this comment
by photography_on_the_run July 2, 2008 2:01 PM PDT
Word processors are currently serial applications because they are limited by past generations of processors. Start thinking about adding voice recognition, intelligent phrase fill in, or even body language into it. stop thinking small.
Reply to this comment
by eightwings July 2, 2008 3:06 PM PDT
Intel is full of it. Multithreading is not the answer to parallel computing and they know it. Managing thousands of threads is absurd to the extreme. Programmers have enough trouble jugling a few threads as it is. To find out why multithreading's days are numbered, read this article:

Parallel Computing: Why the Future Is Non-Algorithmic
http://rebelscience.blogspot.com/2008/05/parallel-computing-why-future-is-non.html
Reply to this comment
by rfielding July 2, 2008 5:58 PM PDT
People need to program efficiently. But once you have asymptotically optimal serial programs, you can only reduce waste by a (possibly large) constant factor.

To those invoking Amdahl's law, you should look up an equivalent formulation of the same equation caled Gustafson's law. The biggest misunderstanding people have of Amdahl's law is in assuming that the percentage of non-parallelizable instructions stays constant as the problem size grows. The theoretical maximum useful number of processors depends on the problem itself. Not having access to the full number of processors due to an arbitrary limit (ex: 32) can cause PARALLELIZABLE programs to become useless on a machine with an insufficient number of processors.

Even if we stay at a small number of cores (are distributed among many systems for geographic reasons), the code that runs on them must eventually be compiler checkable. I see nothing other than message passing concurrency working in the long run (erlang); lock based programming will never fully rise to this challenge because it creates too much coupling among components in the form of locking constraints and violates encapsulation in obscene ways.

Amdahl's law applies to people doing tasks as well. Imagine a scenario in which a hundred people can equal the efforts of an entire country for many important tasks, but the manpower is just not available for the work that CAN be done in parallel. The work that is parallel will swamp the 100 workers for the small group, while the large country reduces the costs of the parallel workload to its theoretical minimum with its extra workers.

The future great programmers are those that can reduce a program to its theoretical minimum given an N processor machine, and figure out an optimal number of processors to implement on in terms of cost/performance. Big-O notation will be revised to include parallelization complexity along with space and serial time complexity.
Reply to this comment
by frustum July 2, 2008 7:57 PM PDT
This is nothing more than good marketing on Intel's part. They've played their hand, there are no do-overs, the stock holders have been sold, and now it's time to sell it to the masses...unless of course you actually believe that Intel is in it for anything other than making money and increasing market share. Multi-core is all the rage right now and I can imagine it will continue for a while until 90% of end users come to the realization that the amount of software to exploit said rage is limited at best. After this brief period has ended, I can also imagine an even better (and more importantly) successful marketing stragedy at Intel : convince the masses that it's the fault of the software makers due to their apparent lack of development skills (reliable real world 1000+ threaded application running on some accountants vista desktop? yeah sure I can see it IF I WORKED AT INTEL). I suppose this might not be a problem if humans had multi-core brains but for the time being we will go without and as a result we'll be expected to solve this problem in reverse. Since when does hardware mandate software? I would offer ever since 90% of end users were left with the feeling that their hardware just hasn't quite lived up to their expectations these last few years. Sure we have more transistors and multi-cores...so what? Wasn't that inevitable? Where's the great next thing? There isn't any, period (unless you're Intel and this stragedy actually works). It's business as usual and end users will continue the cycle of upgrading from a1 to a2 because b2 just came out to replace b1 and in quarter 4 c3 will be 2x faster than c2 but will also require an upgrade from a2 to a3 as well as from b2 to b3. Meanwhile the guys in the trenches debugging those 1000+ threads will bear the heat - it's pure genius on Intel's part assuming they really aren't as arrogant as I think and that they actually can predict the future.
Reply to this comment
by July 2, 2008 8:02 PM PDT
The human brain is massively parallel containing roughly 100 billion neurons (or "cores"), each linked to as many as 10,000 other neurons. It is running an operating system called consciousness that is able to rewrite itself and has been running smoothly without major updates for the last 40000 years. If you really start thinking about it you probably have not even begun to tap what is possible for parallel programming. Or for new user functionality.
Reply to this comment
by July 3, 2008 6:12 AM PDT
The problem with thousand of cores is not hardware related. Yes, Intel can build thousands of core by continued innovation in the hardware. However, the software community is still struggling to understand how to use even 8 cores. The concurrency issues are enormous. For example look at the following post:
<a href http://kashi.webhop.net/blog/Technology/index.php/archives/22> http://kashi.webhop.net/blog/Technology/index.php/archives/22
Till we solve the concurrency issues the increased number of core will not achieve any higher performance.
The C++ is just starting to deal with concurrency. C++0x adds threads support however, that is just the starting point.
Reply to this comment
by ronch79 July 8, 2008 7:36 AM PDT
If I were a software developer and I hear Intel say something like this I'd think they're just passing the blame on us because they can't go on pushing IPC further and further. Heck, getting more ILP (Instruction-Level Parallelism) is hard enough, but dividing a task to as many cores as there are stars in the sky (unbelievable TLP) seems astronomically difficult to me. Okay, most of us aren't that smart to do it, but since NASA has 50% of the world's Ph.D.'s while Intel has the rest kinda makes you think Intel should be lending developers a helping hand in on this.

If and when it's finally achieved computers by then would make today's computers look like UNIVACs and ENIACs.
Reply to this comment
by ilya_cilk July 10, 2008 7:22 AM PDT
I see a couple things a bit differently:

1) First, I am more optimistic than "most desktop apps are not inherently parallel". The good news is that for the compute-intensive portion of an app - which are often the bottleneck and impact user experience - there's plenty of parallelism available. Think about a ray-tracing app, or the meshing algorithm for a finite element package, the image processing tasks in a photo editing app, etc etc. Of all the FOR loops out there, a large portion can take place in parallel, without needing info from a previous iteration. So I would argue that there's plenty of parallelism that - if exposed - can materially impact user experiences.

Here's one blog post related to this topic: http://www.cilk.com/multicore-blog/bid/5365/What-the-is-Parallelism-Anyhow

2) Second, I am less optimistic about functionally decomposing an application ("However, each chip can be dedicated to certain tasks"). Manually programming the threads can be pretty complicated, may not balance well, and may need to be re-written for each microprocessor generation.

Here's a post addressing this topic in more detail: http://www.cilk.com/multicore-blog/bid/5847/The-Folly-Of-Do-It-Yourself-Multithreading
Reply to this comment
by dtremaine November 25, 2008 8:17 AM PST
Azul Systems in Mountain View currently has a system in production that has 864 cores.
Don
Reply to this comment
(26 Comments)
  • prev
  • 1
  • next
advertisement
Click Here

Making sense of Windows 7 upgrades

faq The basics and the fine print on Microsoft's options for those eyeing the next operating system from Redmond.
• Full Windows 7 coverage

Road Trip 2009: Big Sky Country

CNET News reporter Daniel Terdiman takes his car full of gadgets to the Rockies and the Great Plains in search of tech, science, nature, and more.
• America's Fortress: Cheyenne Mountain

About Nanotech - The Circuits Blog

Brooke Crothers was formerly editor-at-large at CNET News.com, an analyst at IDC (International Data Corp.) Japan, and an editor at The Asian Wall Street Journal Weekly (The Wall Street Journal, Dow Jones), among other endeavors, including a recent hiatus from the tech industry when he co-managed an after-school math and reading center. Nanotech covers computer chip technology and how it defines the computing experience. He is a member of the CNET Blog Network, and is not an employee of CNET. Disclosure.

Add this feed to your online news reader

Nanotech - The Circuits Blog topics

advertisement
advertisement

Inside CNET News

Scroll Left Scroll Right