Personal computers have become much more reliable over the last 10 years or so, mostly due to the introduction of advanced operating systems with memory protection and hardware abstraction. The hardware itself has gotten better too; uncorrectable random errors are rare in PCs and extraordinarily rare in server-class systems.
These and other improvements have largely eliminated machine crashes. Blue-screen errors on Windows and kernel panics in Linux and Mac OS X still occur, but much more rarely.
Error-reporting services have become common, helping software developers figure out what went wrong. Most large developers now issue regular patches to fix newly discovered bugs, making systems more reliable between major releases.
All this progress is wonderful, of course, but our PCs still aren't reliable in the way that other consumer products are reliable. Machine crashes are still possible, and any bug can bring down an individual application.
Automobiles, for example, can fail in many ways, but they are still much more reliable than PCs. The risks associated with vehicle failures have been greatly reduced by decades of design refinements. Would you feel safe if PC technology controlled the steering and brakes in your car? Conversely, wouldn't you be more confident in your PC if you knew it was as reliable as your vehicle?
Can you rely on your system to display this 370-megapixel image?
(Credit: European Southern Observatory (ESO))PCs are also fragile in response to change. I know I'm always a little nervous the first time I install a new device driver or run a new application. Even without software changes, opening an unusually large image can induce some trepidation. Consider this 370-megapixel image of the Lagoon Nebula available from the European Southern Observatory Web site; how confident are you that all of your image-viewing programs would survive the attempt to open it?
And worst of all, PCs are fragile in response to attack. The kinds of problems that are sometimes created accidentally by software bugs are relatively easy to create on purpose.
Minimizing the frequency and consequences of these problems would require tremendous effort from everyone in the industry. Almost every bit of PC hardware and software would have to change. One part of the solution is an extension of the same techniques that make today's PCs more reliable than older models: more hardware-based isolation of one function from another.
The minimal isolation of today's systems is very convenient for software developers, making it easier to write code and achieve high levels of performance. More isolation means more complexity and more overhead, but it improves reliability.
Developers are taking the first steps in this direction already, for example, with the process isolation features of the Microsoft Internet Explorer 8 and Google Chrome browsers. But there's much more that can be done.
Another way to improve reliability is to verify that data and addresses are consistent in range and format with the original intent of the software developer before they are used by the program. Making these checks in software can help; the incidence of failures related to accidental and deliberate buffer-overflow conditions has been dramatically reduced in this way. There's plenty of room for new hardware to help in this process too.
There's also work to be done in making it easier to recover from failures, since true hardware failures are inevitable. This is another area where some high-end systems are way ahead of the PC. Fault-tolerant machine architectures have been around for a long time in the aerospace industry, for example.
Historically, fault tolerance has never been practical on the PC because PCs always had only one of each critical subsystem: one processor, one bank of memory, one display channel. Today, PC processors and graphics chips have multiple cores and multiple memory interfaces, creating the potential for redundant operation where it's most needed.
Recoverability also implies backups--not just of the contents of disk drives, but even of the live data in memory through checkpointing. And disk backups can be improved too, by making the backup process an integral part of all disk I/O. Modern file systems use journaling to increase reliability; this technique can be extended to allow recovering from errors long after they occur.
There will be a heavy price to be paid in complexity and performance for all of these techniques, but the currency for this payment is transistors, and Moore's Law gives us more of those in every new process generation. We need to consider how we want to allocate these transistors. Over time, I believe reliability should account for an increasing portion of them.
After 19 months of consulting--in Silicon Valley, we prefer that term to "unemployment"--I've accepted a job.
Once I start, I'll have to stop blogging. But while I'm still independent, I'd like to wrap up here by offering a short series of articles addressing several key topics in the area of personal computing.
Today, the topic is energy efficiency.
Energy efficiency has become a major selling point of today's personal computers, especially laptops, because power consumption determines battery life.
Unfortunately, laptops are being optimized for energy efficiency in a way that isn't fully consistent with the needs of laptop users.
Advances in process technology and CPU design have greatly improved the power efficiency of modern microprocessors when they're running. This improvement is most visible at the highest performance levels.
Over the last few years, dual-core laptop processors have gone from maximum speeds of roughly 2.4GHz to 3.0GHz without consuming any more power. The newest quad-core chips provide much more aggregate performance in a similar power envelope.
This improvement in operating efficiency is great for gaming, mobile video editing, and a few other applications. But it's not very meaningful for most consumers.
What the rest of us need is non-operating efficiency, the ability of the laptop to consume very little power when it isn't doing much because that's what our laptops are usually doing.
We need laptops that can do nothing--more efficiently.
I've been looking at the newest crop of ultra low-power laptops. Based on published benchmark data, they consume an average of 8W to 10W of power when doing essentially nothing (what we call "idle power"). Even the best of them consumes about 6W of power at all times, getting 10 hours of battery life from a 60WH battery. Maybe 2W of that is spent keeping the display on. The other 4W to 8W is just wasted by the CPU and other motherboard circuitry.
When your laptop isn't doing much--for example, when you're typing in your word processor--it's using only slightly more CPU performance than your cell phone is when you're texting. Your cell phone consumes very little power to do this meager amount of work, usually no more than 0.25W or so for the CPU and its support chips. The corresponding elements of your laptop, however, may consume 50 times as much power under similar conditions.
Some of this difference is inevitable; your laptop has wider data buses, more and faster RAM, and so on. Nevertheless, your laptop motherboard could be designed to idle along on 1W or so.
That would give you a total system-level power consumption of around 3W--half the power of today's most energy-efficient laptops and about one-quarter the power of an average machine. Because there's a relationship between peak CPU speed and idle power, today's fastest laptops consume 20W or more at idle. With more energy-aware designs, these systems could see even greater proportional reductions.
In other words, adopting more aggressive methods for reducing idle power could easily double battery life across the board, and some systems would see much bigger improvements.
This is not merely a quantitative improvement. Consider what happens when your laptop can comfortably operate for 20 hours with the display on, or 60 hours with the display off.
For one thing, it never has to go to sleep. Your cell phone never really goes to sleep, and that's a great part of its value. Your laptop can have this same cell phone operating model.
Closing the lid should turn off the display, but the machine should keep running. It can stay connected to the Internet over Wi-Fi or 3G, periodically download your new e-mail messages, watch that eBay auction, and do whatever else you need it to do...all the time. Just plug it in to recharge while you're asleep. (If the laptop is in your briefcase, it'll have to slow down a lot to keep from consuming too much power, but that's easily managed.)
When you're ready to start using the machine actively again, it shouldn't take any longer to turn the display on again than it does to physically open the lid. Think "always on," not "instant on."
All of this is possible with today's technology, but nobody's doing it. I think one of the reasons we don't see this usage model is that laptop buyers don't know to ask for it. Incremental improvements produce adequate sales figures with each new laptop generation, and everyone figures that's good enough.
But mark my words: the first full-function laptop that works like a cell phone--always running, always connected, always ready--is going to hit the market like a sledgehammer. Everything else is going to seem obsolete overnight.
In part 1 and part 2 of this series, I claimed that there is apparently a secret rule in the microprocessor industry that determines the success--or failure--of new chip designs.
The failures included RISC processors, media processors, and intelligent RAM chips, which all sank in spite of clearly demonstrable advantages over alternative solutions. The great success is the programmable graphics processing unit (GPU), which has succeeded in spite of the sometimes wrenching shifts in programming methods and PC system architecture that have been required to support it.
So what's the secret? Simply this: a factor-of-two advantage, even if it's an inherent, persistent advantage, isn't enough to unseat an incumbent solution in the face of even the mildest competitive disadvantage. Without a factor of 10--a full order of magnitude--a new product won't even get a foot in the door.
That's why I call this rule the "factor factor." It isn't enough to be a few times faster than the existing alternatives. Given the performance consequences of Moore's Law, it's easier for your potential customers to wait a few years rather than spend a few years adapting to your "issues." You need be much faster than the products you're trying to replace. The target factor is 10--no less.
Sometimes, even a tenfold advantage isn't enough. One order of magnitude is enough to overcome one disadvantage, such as a change of programming methods. Add another simultaneous disadvantage, however, like the serious constraint in local memory capacity imposed by the IRAM concept, and the new technology may need a factor of 100 in performance to win a place in the market.
Overall, a new product must deliver net benefits amounting to as much as a full order of magnitude in cost, performance, or productivity to compensate for each significant disadvantage. That's just what it takes to motivate customers to deal with the problems rather than waiting for Moore's Law to speed up the solutions that are already familiar to them.
The introduction of the AMD64 instruction set by Advanced Micro Devices (also known as EM64T or "Intel 64" on Intel processors, or generically as x86-64) represents the ultimate success case for the factor factor.
AMD's Athlon 64 debuted the AMD64 instruction-set architecture.
(Credit: Advanced Micro Devices)This isn't immediately clear, I suppose. Adopting the AMD64 standard required a lot of work by operating system vendors and software developers, and the performance benefit was relatively mild in most cases. But still, AMD64 was an immediate success because the performance benefit in certain applications--those that simply wouldn't fit into a 32-bit address space--was practically infinite.
Although the factor factor seems obvious--or at least it should--it's still at the heart of many failed products and hundreds of millions of dollars of wasted investments every year.
In Silicon Valley, like other chip-design centers around the world, projects rarely fail because of poor execution. In most projects, the engineers are good at their jobs, the managers are good at coordinating their work, and the investment is sufficient to get the work done.
Most projects fail at the conceptual level, before the detail design work even begins. The factor factor is only one of many reasons for these failures, of course, but it's the one that disturbs me the most because it's the easiest to anticipate.
This rule doesn't apply to all products. When a new chip for an existing market is architecturally compatible with previous products, a factor-of-two performance improvement is plenty. Even smaller benefits can justify the costs of developing a new product if there are few, if any, disadvantages associated with it.
Multicore CPUs are one of these products, at least for now. Process technology makes it pretty easy to double core counts. Dual-core CPUs were almost a drop-in replacement for single-core chips and caused no serious problems. Quad-core chips were the same thing again. Eight-core CPUs may be a lesson in diminishing returns, but I'm sure they'll be commercially successful.
Beyond that, we'll have to see how it goes. The critical advantage of the CPU over the GPU is high performance on inherently serial processing tasks (what we sometimes call "single-threaded applications"). On a typical PC, there's rarely more than a few of these tasks running at any given moment. It's always useful to have a few extra cores available for parallel tasks, but at some point (I'm thinking somewhere around the 16-core level), PC buyers are likely to stop paying extra for more extra cores.
Even mighty Intel could find itself on the wrong side of the factor factor. Given that quad-core chips became a mainstream product just this year, we can expect to see 16-core processors for ordinary desktop PCs in 2013 and laptops in 2015 or so. By that time, the GPU could be the incumbent solution for high-performance parallel processing, and multicore CPUs could be the technology looking for compelling performance advantages.
So...now you know the supposed secret. When you hear about a radical new microprocessor architecture, you can do what I do: imagine the numeral "1" followed by a "0" for each drawback you see in the proposal. Compare that figure with the claimed benefits and you'll know which way to bet.
By the way, kudos to CNET users divisionbyzero and TrinityTrident, who proved my point that this rule isn't really a secret by explaining it on their comments to the previous posts in this three-part series.
Now if someone could only explain why so many companies don't seem to know this rule!
Intel promotes the Turbo Boost technology in its new Core i7 Mobile processors as a way to adapt to the needs of the software and get more performance from the chip, but this isn't the real reason the technology exists.
The new "Clarksfield" Core i7 Mobile processors introduced at the Intel Developer Forum last week are certainly very impressive. They're huge high-performance quad-core chips with Hyper-Threading, support for two channels of DDR3-1333 DRAM, and an on-die PCI Express controller for the fastest possible connection to discrete graphics chips.
Intel VP Mooly Eden shows off the new Core i7 Mobile processor and its companion I/O controller at the Intel Developer Forum.
(Credit: Intel)In his IDF session announcing these parts, Intel Vice President Mooly Eden said the best of these parts, the 2GHz Core i7-920XM Extreme Edition, is "the fastest quad-core processor, the fastest dual-core processor, and the fastest single-core processor"-- all in one chip.
The key to this dramatic claim is a feature called Turbo Boost technology. Basically, if the current application workload isn't keeping all four cores fully busy and pushing right up against the chip's TDP (Thermal Design Power) limit, Turbo Boost can increase the clock speed of each core individually to get more performance out of the chip.
It's easy to see how this works when just one or two cores are being actively used; whatever power the other two or three cores would have consumed can be redirected over to the active cores, allowing them to run at higher speeds.
The quad-core mode of Turbo Boost is a little more subtle; it works when the four cores aren't running a worst-case workload--for example, integer-heavy processing, since it's generally floating-point calculations that consume the most power--so they aren't bumping into the TDP limit. Turbo Boost can increase the frequency of all four cores until they're running as fast as they can for the current workload.
Eden said that the Turbo Boost controller ... Read more
The mysteries of the Lynnfield and Jasper Forest die photos (from last week's post titled "Investigating Intel's Lynnfield mysteries") were all cleared up at the Intel Developer Forum last week, and as expected, there was nothing sinister going on--just some confusion in Intel's graphics arts department.
With the help of the always-helpful George Alfs of Intel's press relations department and Intel vice president Mooly Eden (general manager of Intel's PC Client Group), we got everything straightened out. Literally!
Here's the die photo of Intel's Lynnfield chip from my previous post:
Die photo of the Core i5/Core i7 processor code-named Lynnfield, with labels.
(Credit: Intel)This is the newest (shipping) part based on the Nehalem microarchitecture, differing from the earlier Bloomfield by the addition of an on-die PCI Express controller. Both chips are made in Intel's 45nm process technology.
According to Eden, the Lynnfield chip design is shared with several other Intel chips that will be on the market soon, including ... Read more
How would you like a single-chip microprocessor with more than four times the performance (on some applications) of Intel's best Core i7?
Then consider that up to 32 of these chips can be directly connected to form a single server, achieving four times the built-in scalability of Intel's next-generation Nehalem-EX processor.
That's IBM's widely anticipated Power7, which it described at last week's Hot Chips conference. But if you're interested, you'd better be prepared to spend a lot more than four times as much per chip. IBM isn't talking about pricing, but large Power servers can cost more than $10,000 per processor.
IBM's forthcoming Power7 server processor has eight cores, manages 32 threads, and includes 32MB of on-chip embedded DRAM cache. Power7 also has the highest levels of off-chip bandwidth ever achieved by a microprocessor.
(Credit: IBM)What makes the Power7 so powerful? Each chip has eight cores, and each core supports four-way multithreading. There's 32MB of level-3 cache on the chip, made using embedded DRAM (eDRAM) cells. Most CPUs use SRAM for cache because it's generally easier to combine with high-performance logic, but DRAMs--with only one transistor per bit--offer compelling density advantages. IBM spent years developing a new kind of eDRAM that would work with SOI (silicon on insulator) manufacturing processes, and the Power7 is the most advanced product to use the new technology.
Interestingly, the Power7 cores run much more slowly than those in the Power6 processor, which I wrote about here in 2007 ("Live from Hot Chips 19: Session 1, IBM's Power6"). The Power6 was designed to run very fast using a long CPU pipeline in order to deliver the highest possible performance on each thread of execution.
Maybe that strategy didn't work out as well as IBM hoped, because the Power7 returns to a more traditional microarchitecture with a shorter pipeline and much lower clock rates--though IBM didn't say exactly what those rates would be.
IBM did, however, promise that the Power7 would be roughly four times as fast as the Power6, chip for chip. Since it has four times as many cores, each of the new slower-clocked cores must still deliver about as much performance as those in the previous generation.
Chip-level performance must always be matched by off-chip connections lest the incoming data or outgoing results be bottlenecked by a too-slow channel. Accordingly, the Power7 is equipped with eight I/O channels for DRAM, each of which connects to an off-chip buffering device that splits the channel into two 64-bit DRAM interfaces. All together, IBM says the Power7 has 180 GBps of DRAM interconnect that can sustain over 100 GBps of effective memory bandwidth.
There's another 50 GBps of peak I/O bandwidth and a staggering 360 GBps of peak bandwidth used to let each Power7 chip communicate with others. The DRAM connected to each chip is thus shared across larger systems.
Combining these figures, IBM says a single Power7 has 590 GBps of total off-chip bandwidth. This isn't the real number, since many of those bytes are used for error-correcting codes and other overhead, but it's still pretty impressive.
So is Power7's die size: 567 square millimeters for 1.2 billion transistors. That's nearly a square inch! IBM says that if the 32MB L3 cache had been manufactured using SRAM, the transistor count would have been 2.7 billion instead.
Still, Power7 wasn't the only high-end chip talked about at Hot Chips.
Rainbow Falls, a record for core count
Sun Microsystems was there to describe its forthcoming Rainbow Falls chip, which I assume will be marketed as the UltraSparc T3. The chip has 16 cores, each of which is reportedly able to manage 8 threads.
Sun's primary Rainbow Falls presentation focused on details of Rainbow Falls' internal and external interconnects; a second talk described the cryptographic coprocessors present in each of the chip's cores. These coprocessors--one for modular arithmetic (commonly used in public-key cryptography) and a cipher/hash unit to accelerate bulk ciphers like AES and secure hash algorithms--provide many times the performance of pure software implementations.
Fujitsu was also at Hot Chips to describe its eight-core, 2GHz Sparc64 VIIIfx processor, the latest in a long series of impressive designs from the company. Fujitsu quoted a peak performance figure of 128 GFLOPS (billions of floating-point operations per second) with a typical power consumption of just 58 watts. It did not, however, provide sustained performance or worst-case power consumption figures.
AMD, Intel vie for high-volume servers
Few of us will have direct exposure to the IBM, Sun, and Fujitsu chips. A pair of presentations from Advanced Micro Devices and Intel described products that will be much more widely available.
AMD launched its six-core Opteron processor code-named "Istanbul" earlier this year (see Brooke Crothers' coverage from June). Next year the company will begin shipping a new Opteron model currently code-named Magny-Cours (after a racetrack in France). Magny-Cours will consist of two Istanbul chips in a single package, with twice as many DRAM interfaces to support the new processor's increased performance.
AMD also teased the audience with another mention of a new processor core design that has been under development there for several years: "Bulldozer," which is now targeted at 32nm process technology. This new core will incorporate new x86 instruction-set extensions which will probably not be adopted by Intel (a strategy that reminds me of AMD's old 3DNow extensions).
But saving the best for last--best, that is, from the perspective of anticipated sales--Intel's talk on Nehalem-EX showed just how far Intel has been able to push the technology envelope for high-volume servers.
Nehalem-EX is an eight-core version of the existing quad-core Nehalem design. The new chip also has 24MB of L3 cache done in old-school SRAM. By my calculations, about 60 percent of the chip's 2.3 billion transistors are in this cache alone.
Nehalem provides four links to external DRAM buffer chips supporting two DDR3 DRAM interfaces each (much like the Power7 solution) and four QuickPath Interconnect links that provide direct "glueless" connections for up to eight-processor systems (64 cores, 128 threads). Intel is also working on an external Node Controller chip for systems with up to 2,048 Nehalem-EX processors.
The aggregate bandwidth numbers for Nehalem aren't as mind-boggling as those for Power7, but they're still far beyond anything available for PC-architecture servers today. Based on the presentation, I estimate Nehalem could boast over 85 GBps of peak memory bandwidth and 100 GBps of chip-to-chip bandwidth, some of which must be allocated to I/O.
I expect the raw number-crunching performance of the Nehalem-EX cores to be roughly on the same level as Power7's cores. The lower ratio of bandwidth to processing power for Nehalem-EX reflects a different design target, not a design shortfall--and most importantly, a much lower selling price. There will presumably be versions of Nehalem-EX priced similarly to existing Xeon MP products, which currently top out at $2,301 each in small volumes, but that's a very reasonable price to pay for the market's most advanced x86 server processor.
I spent Tuesday at Nvidia headquarters, attending the company's annual Analyst Day.
I've been to most of Nvidia's analyst events over the last decade or so, since I covered Nvidia almost from its inception while working as the graphics analyst at Microprocessor Report. These meetings are always a good way to get an update on the company's business operations, and sometimes--like this time--one provides exceptionally good insight into larger industry trends.
Nvidia's GeForce GTX 280 graphics chip
(Credit: Nvidia)Nvidia has had a rough couple of quarters in the market, which CEO Jen-Hsun Huang blamed in part on a bad strategic call in early 2008: to place orders for large quantities of new chips to be delivered later in the year. When the recession hit, these orders turned into about six months of inventory, much of which simply couldn't be sold at the usual markup.
In response, Nvidia CFO David White outlined measures the company plans to take to increase revenue, sell a more valuable mix of products, reduce the cost of goods sold, and cut back on Nvidia's operating expenses.
Three things stood out for me in this presentation:
Nvidia is planning an aggressive transition to state-of-the-art ASIC fabrication technology at TSMC, the company's manufacturing partner. Within "two to three quarters," White said, about two-thirds of the chips Nvidia sells will be made using 40-nanometer process technology. (The first of these chips were announced Tuesday.)
White also acknowledged something that I've long assumed to be true: Nvidia receives "preferential allocation" on advanced process technology at TSMC. It's logical that Nvidia should get the red-carpet treatment, having been TSMC's best customer for many years, but I don't recall hearing Nvidia or TSMC put this fact on the record before.
The third notable point from White's presentation: the gross margins for Nvidia's Tegra, an ARM-based application processor--which Nvidia's Mike Rayfield, general manager of the Tegra division, says has already garnered 42 design wins at 27 companies--are much higher than I'd have guessed--at "over 45 percent." That's quite excellent for an ARM-based SoC; it's a very competitive market.
More surprises
The technical sessions at the event contained their own surprises.
For example, Nvidia effectively seized control of an old Intel marketing buzzword: "balanced."
For years, Intel used to talk about ... Read more
I honestly don't know whether Om Malik's blog site, GigaOM, is intended to be informative or merely entertaining. I pointed out a previous example of the overwrought rhetoric that permeates that site last September (in the context of Comcast's then-new usage cap policy), but generally, I try to ignore the nonsense there for the same reasons that I ignore talk radio.
But like it or not, GigaOM is widely read, and sometimes when a post there bears directly on a market that's important to me, I can't bear to let it go. This is one of those times.
On Thursday, a GigaOM staffer wrote a piece titled "Can Intel Thrive in a Post x86 World?"
A slide from Fred Weber's keynote presentation at Microprocessor Forum 2003 showing how x86 will evolve into systems from big servers down to handheld consumer devices.
(Credit: Advanced Micro Devices, Inc.)The headline is preposterous from beginning to end. It has two implications just in the eight words of the title: that Intel's ability to "thrive" faces any imminent threats, and that the importance of the x86 architecture is declining.
In January, the same staffer wrote a piece titled "Netbooks and the Death of x86 Computing" which reached the fantastic conclusion that Netbooks would "destroy the hegemony of x86 machines for personal computing."
Well, as I pointed out just a few weeks later (in "The Netbook is dead. Long live the notebook!"), when the Netbook phenomenon ran up against the dominance of Intel and Microsoft in the PC market, it was the Netbook that died instead. Even at a $300 price point, people still want full PC compatibility.
Yes, there are companies like Freescale (the subject of the January post on GigaOM) and Nvidia that are looking to push the ARM architecture into the Netbook space. But that idea never made much sense, and now that Intel and TSMC are working together to get Intel's Atom x86 core into lower-cost SoC (system on chip) products, the ARM architecture will eventually have to retreat into the shrinking niche for supersmall, supercheap phones and consumer electronics gizmos for which x86 compatibility is of negligible value.
See, we learned a long time ago--those of us who cover this industry professionally, not just as a random assignment for some random blog--that the instruction set architecture (ISA), per se, doesn't matter any more.
The choice of ISA was a big deal in the 1980s and early 1990s, when the extra complexity of an x86 instruction decoder was a large fraction of the total complexity of a microprocessor. That's where the conflict between RISC and CISC came from.
But by the turn of the century, ISA complexity was almost a dead issue, and that coffin's final nail was pounded in by the keynote speech of then-Advanced Micro Devices CTO Fred Weber at Microprocessor Forum 2003, an event I had the honor of hosting.
In his talk, "Towards Instruction Set Consolidation," Weber made a simple point: "Technology has passed the point where instruction set costs are at all relevant."
Even then, three generations of process technology ago, the "x86 penalty" was down to a couple square millimeters of silicon. Today, the comparable figure is about 0.25 square millimeters. Not zero, certainly, but not a significant concern for chips that are a hundred times larger.
In short, ARM chips aren't cheaper or more power-efficient because of their instruction sets; they're like that because they're designed to be. And anything that an ARM chip can do to save cost or power can also be done by an x86 chip.
So there can't ever be a time when the world moves beyond x86. That's 1980s thinking, just plain ignorance of what may be the most important trend in the microprocessor industry.
The rest of Thursday's GigaOM post is a hopelessly self-contradictory muddle that fails to reach any clear conclusions. I'll just quote one more line near the end: "But the PC will be just one small (and shrinking) battleground to keep x86 relevant, amid a more mobile, visual, and power-sensitive world."
Current economic woes aside, the PC market is hardly shrinking. You know what's shrinking? The PC! As the PC shrinks, the PC market will grow. The MID (mobile Internet device) market isn't much to speak of right now, for example, but once MID makers figure out what to build, MIDs will become more popular.
And seriously, is anyone really not clear on the fact that the Apple iPhone is a computer? It isn't an embedded system. An embedded system is one in which the presence of a microprocessor is functionally irrelevant to the user. When a gizmo exposes its programmability to the user, it's a computer.
What else is the App Store but the visible manifestation of the iPhone's programmability?
Now, ARM isn't dead yet. The iPhone uses an ARM processor because there's no x86 processor that would work as well in that system. ARM processors will probably see at least two more generations in cell phones just because there's so much ARM-based software out there (including all the software on the App Store).
But somewhere around 2012, we're going to see x86 chips poking into that space. The value of instruction set compatibility with the PC market will persuade developers of new cell phone platforms to go with x86 chips, and eventually even established systems like the iPhone will switch over.
So not only are x86 chips selling into a growing PC market, they'll eventually start eating into ARM's own strongholds. That can't be bad for Intel.
And that's why the GigaOM piece was preposterous.
Don't get me wrong-- I think the Intel-TSMC alliance announced earlier this week is a good thing for both companies.
But the official explanation, that Intel wants TSMC's help to make Atom processor cores more widely available to the industry, just doesn't strike me as a sufficient reason for the deal.
Intel hardly needs TSMC's help to make SoCs (systems on a chip). Intel has been making highly integrated devices for the embedded market, as well as PC chipsets for a long time. It already has enough of the building blocks and enough experienced engineers to make Atom-based SoC products.
And it isn't as if Intel needs better process technology, or more fabrication capacity. Intel already has more of the best fabs in the world than any other company.
What's the one thing TSMC can do that Intel can't? Operate with low gross margins. In its most recent quarter, TSMC's gross margin was only 31.3 percent, while Intel's gross margin is still an industry benchmark at 53 percent. The difference is more than Intel's net profit--that is, if Intel had TSMC's gross margins, it would be losing money.
Low-margin component suppliers are a critical element of the embedded-systems market, which Intel identified as one of its target markets for this deal. Cost is king in consumer electronics, so high-margin suppliers like Intel rarely get a chance to participate.
Similarly, as average PC-selling prices decline, a growing share of the demand for processors and chipsets drops into price ranges in which Intel just can't afford to play.
The TSMC deal is Intel's way of taking a piece of these businesses without spending much money or taking much risk. For example, TSMC is already accustomed to helping its customers make SoCs for embedded systems. Intel could build such a business itself, but not at the margins it's used to.
Intel said in its press release that it will be porting its Atom cores to TSMC's technology. This is the sort of work that can get expensive in engineering time, but it's possible that the work will be made easier by a convergence between TSMC's processes and Intel's.
Last May, Intel agreed to cooperate with TSMC and Samsung in the transition to larger 450-millimeter silicon wafers (a little less than 18 inches across, up from the 12-inch wafers used today).
This doesn't necessarily mean that the three companies will co-develop fully compatible manufacturing processes, but with the 450mm transition being slated for 2012, there's still plenty of time left to drop that other shoe.
Anyway, this new TSMC deal is merely at the earliest official stage. The companies have signed a memorandum of understanding, but they have yet to work out the details. That could take a year, and it could be another year or two before Atom-based chips are ready to start rolling through the TSMC factory.
All in all, Atom SoCs might not become available from TSMC until 2012, at which point, they could, in principle, be made on a common Intel-TSMC process.
Not that Intel would provide its really good process technology to TSMC. In chips, as in other things, quality is expensive. Intel's best process technology, which it uses primarily for microprocessors, is at the leading edge of semiconductor manufacturing, with features such as a metal electrode acting as the transistor's gate, a hafnium-based insulation between the gate and the channel, and strained silicon in the transistor channel itself (where the current flows when the transistor is on). (See this Intel presentation for more details. Incidentally, did Intel ever announce which metal it's using? If so, I can't find it.)
TSMC may not need or want any of these features, and it would make sense for Intel to keep its best process technology to itself, anyway, if only to protect its high profit margins.
Even without a leading-edge process, TSMC can still make good money from Atom-based SoCs in the embedded market. That's enough to justify TSMC's participation in the deal.
But I'm not sure that explains Intel's motivation. Sure, Intel will make money it wouldn't have made otherwise, but it will also have costs it wouldn't have had otherwise. Intel may make a few bucks per chip in intellectual-property licensing fees, and perhaps this could amount to hundreds of millions of dollars a year, but that isn't a whole lot of money to a company like Intel, which makes tens of billions of dollars a year in gross revenue.
Why else would Intel be doing this deal?
Well, I think that the chipmaker could be setting itself up to kill off three of its biggest rivals.
There's already an x86 processor company using TSMC to make (some of) its chips: Via Technologies. Via isn't a big player, but it's been a thorn in Intel's side ever since it purchased the x86 processor operations of IDT (WinChip) and National Semiconductor (Cyrix) in 1999.
Via specializes in exactly the kind of processors that Intel can't afford to sell: low-cost, highly efficient designs aimed at low-cost PCs and embedded systems. Today's Atom is better than Via's best chips, but it's also more expensive. A cheaper TSMC-sourced alternative will hurt Via badly.
Most of the same reasoning applies to ARM, which licenses its processor cores to be used in SoCs made at TSMC, among other fabs. That's almost the same business model Intel is adopting with its own TSMC deal.
ARM dominates the market for microprocessors in cell phones. Intel's current Atom processors are too expensive and too power-hungry for that market. But remember, it'll be a couple of years at least before Atom-based chips start shipping from TSMC. The Atom cores of 2011 or 2012 will be more directly competitive with ARM's cores.
So put ARM on the endangered-species list too.
There's one other company that ought to be worried by this deal, and it probably isn't one you'd expect: Nvidia.
Nvidia is generally thought to be TSMC's biggest customer. It doesn't make x86 processors (though there are persistent rumors that the company is developing one), but it does make the ARM-based Tegra family, which would run up against these future Atom chips.
It's Nvidia's graphics chips that I'm worried about, however.
Intel is developing graphics chips of its own under the Larrabee code name. I wrote about Larrabee last August, and it seemed like a bad idea to me at the time. One of my key objections, however, was that graphics chips are inherently a low-margin business due to the strong competition between AMD and Nvidia, and I didn't think that Intel could afford to drag down its margins just to compete in that market.
The TSMC deal changes all that.
Larrabee's cores aren't Atom cores, per se, but they're similar enough that Intel might consider them to be covered by the language in the TSMC partnership announcement. Or if not, agreements can always be expanded later.
Making Larrabee chips at TSMC would solve the margin problem, putting Intel's graphics chips on a level playing field with Nvidia's. Larrabee would still be at a significant disadvantage because its x86-based design isn't as well-suited to graphics acceleration as Nvidia's chips, but Intel has a special ability to sell inferior products along with other chips its customers need--especially processors. That's reportedly how Intel's slow integrated-graphics chipsets ended up in so many systems during the Windows Vista transition, leading to many disappointed customers.
Or it's possible that Intel will not allow the TSMC deal to harm these companies, if only because Intel may still be in court defending itself against AMD's antitrust lawsuit.
But I wouldn't make that assumption, and I bet that ARM, Nvidia, and Via won't either. Intel isn't the only paranoid company in this industry.
Intel announced on Monday that it will be presenting a paper at Siggraph 2008 about its "many-core" Larrabee architecture, which will be the basis of future Intel graphics processors.
The paper itself, however, has already been published, and I was able to get a copy of it. (Unfortunately, as you'll see at that link, the paper is normally available only to members of the Association for Computing Machinery.)
Intel's Larrabee includes "many" cores, on-chip memory controllers, a wide ring bus for on-chip communications, and a small amount of graphics-specific logic.
(Credit: Intel)The paper is a pretty thorough summary of Intel's motives for developing Larrabee and the major features of the new architecture. Basically, Larrabee is about using many simple x86 cores--more than you'd see in the central processor (CPU) of the system--to implement a graphics processor (GPU). This concept has received a lot of attention since Intel first started talking about it last year.
... Read more






