September 16, 2005 6:31 AM PDT
Gates on software innovation: 'Magic threshold'
- Related Stories
-
PDC 2005: Rallying point for Redmond
September 16, 2005 -
Gates on Google: What, me worry?
September 13, 2005
This morning, you were speaking about some of the tough problems that software hasn't solved--speech recognition, security, presence. What's holding us back from solving those problems?
Gates: The pace of software innovation today is as fast as it has ever been. In speech recognition, over the past decade, the error rates have come down, down, down, down. Now, we haven't hit that magic threshold where speech recognition is better than the keyboard. It's hard to pick a date where it will be. We totally believe speech recognition will go mainstream somewhere over the next decade. When you use your phone, speech will be your primary input technique. At your desktop, it will be a mix of speech, keyboard and pen.
Our money is where our mouth is. It's like IPTV. I said over a decade ago that would happen. It took longer than I expected, but I'm sure glad we got in early and put the money behind it. I feel the same way about speech. It will be mainstream.
Let's talk about WinFS for a second. That's a good idea. But sometimes these really big ideas are difficult to implement when you have a really large installed base of customers, as Microsoft does, using various versions of Windows. That legacy problem seems to be an impediment to bringing new technology online. Does that get in the way of sweeping changes you'd like to make to Windows?
Gates: Well, that's the real world. We're in a very good position because we understand a lot of what is out there and how we can make moving up to the next thing very straightforward with the least amount of discontinuity. I've always been a big champion of WinFS. I was never satisfied that we were bringing it out as a client-only technology, and I was worried about that. Now we've chosen to skip doing it as a client-only thing and to do it as a big-bang client and server release. There's still a lot of work to be done on that one, and that's all wrapped up in this next release of SQL Server.
Those things are hard. They are fantastic when you get to them, because they greatly simplify things. It's the kind of thing that takes a company with a long-term approach on these things and willing to do something quite risky. The Office 12 user interface you saw this morning is another good example of that. Microsoft was willing to take that 2D menu structure that things are kind of buried in and blow that up. Here's Office, the most used software of all time, people are familiar and comfortable with it. Particularly our Office group that wants things to be exactly right. They decided it was time to step back (and do that). There will be some shock among users. But pretty quickly (people get used to it).
Click here for the full interview. For full coverage of PDC 2005, click here.
See more CNET content tagged:
speech recognition,
WinFS,
Bill Gates,
speech,
Microsoft Office







30 people with very diverse ages and accents all record a short
paragraph on tape. The idea was to evaluate the recordings and
determine the essential features that carried the meaning. Any
person could listen to the recordings and have no trouble
understanding the speakers. The goal was to find a way to
implement voice recognition, and perhaps to provide a means to
increase the density of telephone conversations on the existing
phone links.
But a funny thing happened. Massive statistical analyses, and
Fourier Transforms, and all sorts of other analytical techniques
failed to find any correlation between the recorded audio and the
message. It was almost like the message was not carried by the
voice itself.
So now, all the existing voice recognition programs are brute
force and awkwardness approaches trying to match words to
complex audio wave forms. They sort of work, but still have a
terrible time with homonyms in contest (eg. to, two, too). And
until both the feature extraction process and the contextual
meaning anaylsis problems are solved, serious voice recognition
capabilities on the computer, Mac or PC, are just pipe dreams.
Or so that's the story as I have heard it. Anyone with better
information??????
30 people with very diverse ages and accents all record a short
paragraph on tape. The idea was to evaluate the recordings and
determine the essential features that carried the meaning. Any
person could listen to the recordings and have no trouble
understanding the speakers. The goal was to find a way to
implement voice recognition, and perhaps to provide a means to
increase the density of telephone conversations on the existing
phone links.
But a funny thing happened. Massive statistical analyses, and
Fourier Transforms, and all sorts of other analytical techniques
failed to find any correlation between the recorded audio and the
message. It was almost like the message was not carried by the
voice itself.
So now, all the existing voice recognition programs are brute
force and awkwardness approaches trying to match words to
complex audio wave forms. They sort of work, but still have a
terrible time with homonyms in contest (eg. to, two, too). And
until both the feature extraction process and the contextual
meaning anaylsis problems are solved, serious voice recognition
capabilities on the computer, Mac or PC, are just pipe dreams.
Or so that's the story as I have heard it. Anyone with better
information??????