(continued from previous page)
And I imagine that's going to be a huge change in video search, for example. Today when we have video searches, you are basically searching keywords of the Internet page that surrounds the video, the description, that sort of thing. When we start using voice recognition to search within the videos, we'll have a much more powerful experience, right?
Gates: Yeah, that will help a lot. Microsoft Research has some amazing demos around that. In terms of broadcast videos, of course, there's the requirement that there be the text annotation. So if you have that, you actually have the speech-to-text that has been done for the deaf listener, anybody who wants the captioning-type capability. So there's a lot of video out there where if you ingest it in the right way, that's available. For the bottoms-up video, or just a meeting you have in the business, then you're relying on the speech recognition software to make it easy to navigate.
What are some of the areas where you see voice going that people aren't necessarily thinking about today?
Gates: To me, voice is in the broad realm of natural interface. And natural interface is (the notion of) screens everywhere--screen in your desk, screen in your tables, screen on your walls, no more white boards, touching, which is like Surface, where you can manipulate things. It's a pen so you can have ink wherever you want. You know, pull up an article, write a little note on it and get it sent off to a friend.
The speech recognition comes into it--all these things about natural interface are coming to the fore, and they are probably the thing that's most underestimated right now about the digital revolution. People kind of gasp when they see how touch works on Surface, when they touch their iPhone then, "Ooooh, wow," you know, that's just such a natural thing.
When voice recognition is used in the right way--let's say you're in the car and you want to pick somebody to call--that's improved very dramatically, or speech output, text to speech, these things have gotten very good.
You talked about different natural language interfaces. You know, with multitouch, it seems to have really captured people's imaginations, both with what you guys have shown with Surface, certainly with the iPhone. Voice seems to be a little slower in terms of speech recognition as a mainstream computer interface.
Gates: Well, that's fair. Voice recognition is a harder thing. There are certainly tons of people, and I mean millions, who for some reason, the keyboard's not attractive to them. Either they have repetitive stress injury, or they're in a work environment where they're doing something else with their hands, where they've taken the time to learn the software and adapt to the software and gone through the training process there. And they love it. They can't believe other people don't use it.
For the rest of us, the keyboard has worked so well that we are even getting the keyboard into phones. I think voice search on the phone is one of those applications that would really drive it forward. I mean, why should I have to try and type something in? I've got a phone, I've got a talk button; so that's one of the areas we're betting on.
You guys built a pretty significant voice recognition engine into Vista. It hardly gets talked about. Are you surprised that some of the things you did in Vista aren't getting more attention?
Gates: Well, when you sell a product to hundreds of millions of users, there are features that millions of users love that you can call an obscure feature because, percentage wise, it's not very many. You know, Butler Lampson, one of our great researchers who has done great work going all the way back to his days at Xerox, was just sending me mail about how fantastic the improvements in the speech stuff are in Vista and, you know, we're hard at work on the next version of Windows. We're going to take this speech stuff even further.
What about in the developing world? I imagine natural language input, you know, particularly for people who've never used a computer, has some really interesting applications.
Gates: I wouldn't go too far on that because they're not used to what the dialogue should be like, and in most of those places, the cost of labor is low enough that, you (can) have another person on the other end of the connection or talking to them directly. But, yeah, it should work for different languages. It's particularly interesting for Japanese and Chinese where the keyboard is not as natural as it is for languages with modest-sized alphabets. And so we do see ink and voice catching on there.
There was a demo recently where there was a challenge about typists compared with voice recognition, and the voice recognition won out by quite a bit. And so there's a lot that can be done pioneering off of the demand that will come out of those markets.
You've talked a fair amount about taking on just a few projects when you step away from full-time work. Is natural language input and voice one of those areas you think you'll be spending time on?
Gates: Yeah. I'd say, broadly, the whole natural interface thing. Big screens, touch, ink, speech, that's something that I think, along with cloud computing, is the next big change in how we think about software and how it becomes more basic. And, you know, Ray Ozzie is driving our cloud computing stuff and--way ahead of me, very hands-on all that stuff. Some of the natural interface stuff, I think he and Steve will ask me to sort of keep the energy and vision alive there in a strong way. Some of that will be reading off the screen or the tablet, but the whole natural interface area probably will be one that they'll pick.
Any others that you think you will take on?
Gates: Well, it's hard to say. Search is such a fun area right now. They might pick that. There are some ideas about where the Office software should go--I'm really quite enthused about some things. So I'd say those are the three most likely. And it's only going to be three or four, so--they'll have to decide.
13 commentsJoin the conversation! Add your comment