April 24, 2003 1:52 PM PDT
IBM developing translation software
Wouldn't it be handy to be able to talk into a device, asking questions about departures and ticket prices, and have your queries translated into spoken word in the native language of train officials?
IBM is working on software that would bridge the spoken language gap for weary travelers and others who might need a personal translator in their pocket.
Researchers at Big Blue are developing and testing translation software that would enable two people speaking different languages to communicate without either having to type.
"When you go to a new country and you want to deal with all of the issues, this will be very handy," said David Nahamoo, department group manager for human language technologies at IBM Research.
Although several companies, including IBM, produce software that provides text-to-speech translation, so-called speech-to-speech translation products remain on the horizon.
The prototype of the IBM software, dubbed Multilingual Automatic Speech-to-Speech Technology, or "MASTOR," actually does have a text component. Two people speak into microphones connected to a computing gadget. The first person might, for instance, say, "Hi, my name is David" in English. The gadget then converts the speech to text, displays it in English, translates it, displays the translation alongside the original English text, and then speaks the translated version.
If, for example, "David" were trying to chat with a native of Mexico City, the computer would display the text versions and say, "Hola, me llamo David." The other person could then reply in Spanish and have her response translated into English.
The research builds on products IBM has already introduced to the market. Last September, the company unveiled ViaVoice Translator, software that lets people type in a phrase in one language and hear it in another.
Speech-to-speech technology is a particularly tricky endeavor for technologists and linguists, partly because it incorporates so many complex functions. For example, MASTOR includes speech-recognition software to capture the original spoken phrase, translation software to transform it into Spanish, and text-to-speech software so that the computer can say the words aloud.
"The technology needs work in all of these areas," Nahamoo said.
Nahamoo thinks speech-to-speech research projects could drive improvement in the areas of speech recognition and translation software because a glitch in any component of a speech-to-speech system would make other parts virtually impossible to use. What's more, even the best translation and speech software is susceptible to amusing and embarrassing errors as it tries to account for the different speakers' accents, slang and cadence.
Another feature of MASTOR will be its use of the notion of "meaning" in order to translate. Under the theory, different phrases that mean generally the same thing would be translated the same way. For example, a person could say "I'm injured and I need a doctor" or "Can you find me a doctor?" and both would be translated into an identical spoken phrase that would convey the need for medical help.
Using meaning in the translation process is less database intensive than translating more precisely, researchers said. That means the technology could be used for handhelds and other small portable devices that don't have as much memory as a desktop computer.
The researchers said the most immediate application of the software would be for personal or business travel or in health care settings like emergency rooms, where people who don't speak the local language might need to communicate information about their injury or medical history. However, the scientists declined to speculate about when the technology would appear in actual products.
Researchers also envision the speech technology being used to translate newscasts, enabling a media outlet to provide, for example, nearly real-time reports in multiple languages from an event such as the U.S. Open. Researchers also say companies could use the technology for conference calls or meetings, although it still needs a lot of refinement before it's ready for such uses.