• On The Insider: Sexiest Magazine Covers of All Time
July 1, 2008 9:50 AM PDT

Intel says to prepare for 'thousands of cores'

Intel is telling software developers to start thinking about not just tens but thousands of processing cores.

Intel Tera-scale multicore research

Intel Tera-scale multicore research

(Credit: Intel)

Intel currently offers quad-core processors and is expected to bring out a Nehalem processor in the fourth quarter that uses as many as eight cores.

But the chipmaker is now thinking well beyond the traditional processor in a PC or server. Jerry Bautista, the co-director of the Tera-scale Computing Research Program at Intel, recently said that in a graphics-intensive environment the more cores Intel can build the better. "The more cores we have the better. Provided that we can supply memory bandwidth to the device."

On Monday, an Intel engineer took this a step further. Writing in a blog, Anwar Ghuloum, a principal engineer with Intel's Microprocessor Technology Lab, said: "Ultimately, the advice I'll offer is that...developers should start thinking about tens, hundreds, and thousands of cores now."

He said that Intel faces a challenge in "explaining how to tap into this performance." He continues: "Sometimes, the developers are trying to do the minimal amount of work they need to do to tap dual- and quad-core performance...I suppose this was the branch most discussions took a couple of years ago."

Now, however, Intel is increasingly "discussing how to scale performance to core counts that we aren't yet shipping...Dozens, hundreds, and even thousands of cores are not unusual design points around which the conversations meander," he said.

He says that the more radical programming path to tap into many processing cores "presents the 'opportunity' for a major refactoring of their code base, including changes in languages, libraries, and engineering methodologies and conventions they've adhered to for (often) most of the their software's existence."

"Eventually, developers realize that the end point is on the other side of a mountain of silicon innovations...Program for as many cores as possible, even if it is more cores than are currently in shipping products."

Brooke Crothers is a former editor at large at CNET News.com, and has been an editor for the Asian weekly version of the Wall Street Journal. He writes for the CNET Blog Network, and is not a current employee of CNET. Contact him at mbcrothers@gmail.com. Disclosure.
Recent posts from Nanotech: The Circuits Blog
Micron to buy $400 million stake in memory maker
Chip forecasts head south
Micron to cut workforce by 15 percent, slash flash output
Eying solid-state drives, Seagate tries to quell fears
AMD deal triggers Intel license warning
Add a Comment (Log in or register) 25 comments (Showing first 20 comments)
by bluemist9999 July 1, 2008 10:24 AM PDT
Why not program for quantum-phase computers as well?

My point is this: by the time Intel has ramped up to dozens of cores, we may well be on to quantum computing or some other totally different paradigm.

I do agree we need to plan for highly scalable, secure, distributed systems, but there's a huge risk if we develop for a specific mindset, and the mindset changes before it's implemented.
Reply to this comment View reply
by The_Decider July 1, 2008 10:43 AM PDT
IMO, the future of desktop chips is not larger and larger multicore systems but 2 or 4 chip processors that improve in other areas like cache, while the OS manages the chips in a more intelligent manner. The reasoning is that most desktop apps are not inherently parallel, you simply can't get them to run in parallel, or at least not very often. However, each chip can be dedicated to certain tasks, 1 for the OS, and 1 for the apps in the case of a dual core setup. In a quad core system, 1 for the OS, 2 for apps and one for AV and AS(windows only). This will also relieve the not trivial problem of managing data in 4 L1's caches, L2 and main memory. There is a lot of complexity and overhead inherent in having to do this. Hardware companies are constantly trying to get faster chips with more cores without stopping to think 'why are we doing this?' and 'Is this of any real benefit'. There is no need for it on the desktop, and the software needs to dictate hardware, hardware can not dictate software. Progress for the sake of progress has given birth to craptastically horrible software design that uses the fact that hardware gets faster and/or more efficient over time to hide its sloppy design and programming. Microsoft is only one smelly piece of rubbish in a huge pile of refuse that is the software landscape. It has also help spread this toxic and odd idea of spending resources parallelizing software that might only be 10-15% parallel at best. That is a massive waste of resources with no real benefit to the end user. Of course, once you leave the desktop there is a need for chips that can handle the hardware complexity associated with parallelism and that is where real benefit is seen.
Reply to this comment View reply
by rapier1 July 1, 2008 10:48 AM PDT
Having more cores is wonderful assuming that you have a problem that is inherently and largely parallel in the first place. However, many problems are not inherently parallel - or the weighted portion of the application that is parallelizable diminishes significantly by the time you deal with all the attendant support code. I really suggest people look up Amdahl's Law. Not only do we need more cores, we will continue to need faster cores to improve computing power.
Reply to this comment View all 2 replies
by PhattieM July 1, 2008 11:08 AM PDT
I think the idea Intel is trying to convey, is that programmers should begin thinking about their software as a more abstract entity. This includes writing the code so that it is scalable to any amount of processors.

However, as in all processes that stray from static naming -- there will be a decrease in overall performance. As such, the developer needs to be aware of the trade off and program accordingly.
Reply to this comment
by ZeZu420 July 1, 2008 11:18 AM PDT
More cores, in a graphic intensive environment ...
Sounds like this is more likely to be Intel's continued route towards it's own graphics solution.
Delivering the memory bandwidth for this "properly", is going to be difficult and costly.

If we are speaking in terms of general purpose processing, my thoughts would go towards seeing instruction and memory latency issues reduced first. This could boost performance well over 300%, a massive change.
Reply to this comment
by rama2008 July 1, 2008 11:28 AM PDT
This is good news for graphic intensive desktops especially for Real Time Ray Tracing engines. 3D Graphics based on Ray tracing techniques inherently posses parallelism and Ray tracing is far more superior than traditional graphics hardware based rendering. Multi core desktops will allow completely replacing hardware graphics cards with superior real time 3D Ray tracing software engines that can take advantage of multi core systems. I am sure Intel might already be doing research in this area. If real time ray tracing can be brought to the consumer desktops, every thing changes in the way 3D games being developed and the user experience.
Reply to this comment
by Mr Smith8 July 1, 2008 11:52 AM PDT
Use the ASCI Red design! The original incarnation of this machine used 9000+ Intel Pentium Pro processors, each clocked at 200 MHz. The original ASCI Red was the first computer on Earth to rate above 1 teraFLOPS on the MP-Linpack benchmark (1996).
Reply to this comment
by CompEng July 1, 2008 12:39 PM PDT
Most of us here are missing the point here. The point is that Intel doesn't see sufficient growth opportunity in focusing solely on the incremental performance improvements that have been its traditional bread and butter. Meanwhile, it's seeing a huge threat from the GPGPU projects from DAAMIT/Nvidia and the increasing dollar share of the GPU in the desktop computer.
Marketing statements like this help fend off the threat, while causing relatively little collateral damage to its current best-selling products, where Intel is pretty competitive right now. Don't worry, Intel will keep pushing single-threaded performance, latency improvements, etc. AMD and now Via will make sure they do, even ignoring the low-power market for now. They just want it known that they plan on addressing the highly-scalable market too (graphics and HPC), and in a big way.
Reply to this comment
by nachurboy July 1, 2008 12:54 PM PDT
How would you parallelize serial computation? Parallel computing only works for computations or processes that are discrete, which is why you don't see as much day-to-day applications using parallel coding techniques. For example, a word processor may have 2-3 threads running at a time, one to handle UI and input, one to handle data, and one to handle background tasks such as spell checking. What else would you need to do in a separate thread? How would you break those three threads down into more threads? A game is another example where parallelism isn't transferable. I can see Intel selling massively multi-core systems for real-time ray-tracing, but I would think it'd be better using a specialized GPU to do that than coding a ray-tracing library to work on multicore processors, even though it's probably the best use of multi-core. Bottomline, I think it's Intel's marketing spin to get people away from specialized processors to all general processors, just because they have no other way to increase performance anymore.
Reply to this comment View all 2 replies
by Catalina588 July 1, 2008 2:05 PM PDT
For fifty years, operating systems developers have been using more and more sophisticated code to run more tasks (e.g., threads) more efficiently on more cores. One likely change in a world of 1,000+ cores is a total rethink of OS architecture. Maybe we end up with something like the IBM System i (AS/400) where the OS and a thread/task/program have their own virtual memory mapped core. And individual OS services get their cores as needed.

Anyway, I see big OS changes to harness a many-core world; I do not see a rewrite of the world's applications for massive parallelism in my lifetime.
Reply to this comment
by bobby_brady July 1, 2008 3:36 PM PDT
Oh I can't wait! With all the viruses and spywhere there is out there infecting computers, multicore chips will do wonders to make computers faster. Each core can have it's own malware program running while viruses take over a couple dozen. Spywhere can infect the remaining cores.
Reply to this comment
by photography_on_the_run July 2, 2008 2:01 PM PDT
Word processors are currently serial applications because they are limited by past generations of processors. Start thinking about adding voice recognition, intelligent phrase fill in, or even body language into it. stop thinking small.
Reply to this comment
by eightwings July 2, 2008 3:06 PM PDT
Intel is full of it. Multithreading is not the answer to parallel computing and they know it. Managing thousands of threads is absurd to the extreme. Programmers have enough trouble jugling a few threads as it is. To find out why multithreading's days are numbered, read this article:

Parallel Computing: Why the Future Is Non-Algorithmic
http://rebelscience.blogspot.com/2008/05/parallel-computing-why-future-is-non.html
Reply to this comment
by rfielding July 2, 2008 5:58 PM PDT
People need to program efficiently. But once you have asymptotically optimal serial programs, you can only reduce waste by a (possibly large) constant factor.

To those invoking Amdahl's law, you should look up an equivalent formulation of the same equation caled Gustafson's law. The biggest misunderstanding people have of Amdahl's law is in assuming that the percentage of non-parallelizable instructions stays constant as the problem size grows. The theoretical maximum useful number of processors depends on the problem itself. Not having access to the full number of processors due to an arbitrary limit (ex: 32) can cause PARALLELIZABLE programs to become useless on a machine with an insufficient number of processors.

Even if we stay at a small number of cores (are distributed among many systems for geographic reasons), the code that runs on them must eventually be compiler checkable. I see nothing other than message passing concurrency working in the long run (erlang); lock based programming will never fully rise to this challenge because it creates too much coupling among components in the form of locking constraints and violates encapsulation in obscene ways.

Amdahl's law applies to people doing tasks as well. Imagine a scenario in which a hundred people can equal the efforts of an entire country for many important tasks, but the manpower is just not available for the work that CAN be done in parallel. The work that is parallel will swamp the 100 workers for the small group, while the large country reduces the costs of the parallel workload to its theoretical minimum with its extra workers.

The future great programmers are those that can reduce a program to its theoretical minimum given an N processor machine, and figure out an optimal number of processors to implement on in terms of cost/performance. Big-O notation will be revised to include parallelization complexity along with space and serial time complexity.
Reply to this comment
by frustum July 2, 2008 7:57 PM PDT
This is nothing more than good marketing on Intel's part. They've played their hand, there are no do-overs, the stock holders have been sold, and now it's time to sell it to the masses...unless of course you actually believe that Intel is in it for anything other than making money and increasing market share. Multi-core is all the rage right now and I can imagine it will continue for a while until 90% of end users come to the realization that the amount of software to exploit said rage is limited at best. After this brief period has ended, I can also imagine an even better (and more importantly) successful marketing stragedy at Intel : convince the masses that it's the fault of the software makers due to their apparent lack of development skills (reliable real world 1000+ threaded application running on some accountants vista desktop? yeah sure I can see it IF I WORKED AT INTEL). I suppose this might not be a problem if humans had multi-core brains but for the time being we will go without and as a result we'll be expected to solve this problem in reverse. Since when does hardware mandate software? I would offer ever since 90% of end users were left with the feeling that their hardware just hasn't quite lived up to their expectations these last few years. Sure we have more transistors and multi-cores...so what? Wasn't that inevitable? Where's the great next thing? There isn't any, period (unless you're Intel and this stragedy actually works). It's business as usual and end users will continue the cycle of upgrading from a1 to a2 because b2 just came out to replace b1 and in quarter 4 c3 will be 2x faster than c2 but will also require an upgrade from a2 to a3 as well as from b2 to b3. Meanwhile the guys in the trenches debugging those 1000+ threads will bear the heat - it's pure genius on Intel's part assuming they really aren't as arrogant as I think and that they actually can predict the future.
Reply to this comment
by July 2, 2008 8:02 PM PDT
The human brain is massively parallel containing roughly 100 billion neurons (or "cores"), each linked to as many as 10,000 other neurons. It is running an operating system called consciousness that is able to rewrite itself and has been running smoothly without major updates for the last 40000 years. If you really start thinking about it you probably have not even begun to tap what is possible for parallel programming. Or for new user functionality.
Reply to this comment
by July 3, 2008 6:12 AM PDT
The problem with thousand of cores is not hardware related. Yes, Intel can build thousands of core by continued innovation in the hardware. However, the software community is still struggling to understand how to use even 8 cores. The concurrency issues are enormous. For example look at the following post:
<a href http://kashi.webhop.net/blog/Technology/index.php/archives/22> http://kashi.webhop.net/blog/Technology/index.php/archives/22
Till we solve the concurrency issues the increased number of core will not achieve any higher performance.
The C++ is just starting to deal with concurrency. C++0x adds threads support however, that is just the starting point.
Reply to this comment
by ronch79 July 8, 2008 7:36 AM PDT
If I were a software developer and I hear Intel say something like this I'd think they're just passing the blame on us because they can't go on pushing IPC further and further. Heck, getting more ILP (Instruction-Level Parallelism) is hard enough, but dividing a task to as many cores as there are stars in the sky (unbelievable TLP) seems astronomically difficult to me. Okay, most of us aren't that smart to do it, but since NASA has 50% of the world's Ph.D.'s while Intel has the rest kinda makes you think Intel should be lending developers a helping hand in on this.

If and when it's finally achieved computers by then would make today's computers look like UNIVACs and ENIACs.
Reply to this comment
by ilya_cilk July 10, 2008 7:22 AM PDT
I see a couple things a bit differently:

1) First, I am more optimistic than "most desktop apps are not inherently parallel". The good news is that for the compute-intensive portion of an app - which are often the bottleneck and impact user experience - there's plenty of parallelism available. Think about a ray-tracing app, or the meshing algorithm for a finite element package, the image processing tasks in a photo editing app, etc etc. Of all the FOR loops out there, a large portion can take place in parallel, without needing info from a previous iteration. So I would argue that there's plenty of parallelism that - if exposed - can materially impact user experiences.

Here's one blog post related to this topic: http://www.cilk.com/multicore-blog/bid/5365/What-the-is-Parallelism-Anyhow

2) Second, I am less optimistic about functionally decomposing an application ("However, each chip can be dedicated to certain tasks"). Manually programming the threads can be pretty complicated, may not balance well, and may need to be re-written for each microprocessor generation.

Here's a post addressing this topic in more detail: http://www.cilk.com/multicore-blog/bid/5847/The-Folly-Of-Do-It-Yourself-Multithreading
Reply to this comment
 See all 25 Comments >>
Powered by Jive Software
advertisement
Resource center from CNET News sponsors
What Do You Get With Your Hosting Provider?
The Rackspace Essential Server

Rackspace Hosting
It's a server that automatically comes with unlimited support never outsourced, and a world-class network & data centers with solid guarantees all working for your business. We are here 24x7x365 Live

Click Here!
Unlimited, 24x7x365 Live Support

It means customer support with no call centers or automated phone systems

100% Network Uptime Guarantee

Can you afford for your website to offline? Can you trust your current provider?

The Manageable Green Hosting Solution

Choose a green configuration or customize one that works for your business.

The Fanatical Support Promise

Your complete satisfaction is our sole ambition. Anything less is unacceptable.

Certified Windows or Red Hat Expertise

Every customer has a dedicated team of experts managing your IT critical needs.

About Nanotech: The Circuits Blog

Brooke Crothers was formerly editor-at-large at CNET News.com, an analyst at IDC (International Data Corp.) Japan, and an editor at The Asian Wall Street Journal Weekly (The Wall Street Journal, Dow Jones), among other endeavors, including a recent hiatus from the tech industry when he co-managed an after-school math and reading center. Nanotech covers computer chip technology and how it defines the computing experience. He is a member of the CNET Blog Network, and is not an employee of CNET. Disclosure.

Add this feed to your online news reader

Nanotech: The Circuits Blog topics

Featured blogs

advertisement
advertisement

Inside CNET News

Scroll Left Scroll Right