Microsoft aims to build a better thesaurus
A team of researchers at Microsoft is looking to beat Roget at his own game.
Aiming to build a better thesaurus, the Writing Assistance project within Microsoft's research unit is tapping techniques developed to translate from one language to another.
Although thesauri are good at finding lots and lots of synonyms, they require the user to pick the right one because they aren't very good at understanding the context of what is being said. That's where the experience from doing machine translations comes in.
Brockett
(Credit: Microsoft )"We've taken the actual translation tables...and what we've done is we've taken those and said if a word in Chinese maps to two different English words maybe those two words are synonyms with some probability," said Christopher Brockett, a computational linguist and one of the Microsoft researchers leading the project.
The approach has two key benefits over a static thesaurus. First of all, the newer approach can do phrases, as opposed to single words. Also, it can draw on the context in which the phrase is used.
Brockett plans to show off a prototype of the tool next week at TechFest, Microsoft's annual internal science fair. It's just one of dozens of projects that will be shown as part of an effort to expose Microsoft's business units to the work being done in Microsoft's research labs. (Check back next week for CNET's on-the-ground coverage of the event, which kicks off Monday night at Microsoft's campus in Redmond, Wash.)
TechFest is sort of like "The Dating Game" for Microsoft's research and product development arms. Research teams at Microsoft set up booths, somewhat like a high-school science fair, while product teams shuffle through looking for something that might give their efforts a leg up on the competition.
For the public, TechFest can also offer a glimpse at future product directions. For example, researcher Andy Wilson showed off a number of surface computing projects in the years leading up to the debut of Microsoft's Surface product.
As is the case with most of the projects, the thesaurus effort is still in its infancy.
"We're still working on the algorithms and how much work we give to the language pairs," Brockett said. "We have to get the quality up. There are usability issues that have to be looked into."
Over time, though, Brockett hopes the technique could be used to effectively translate whole sentences. Microsoft has a demonstration of that up on its Web site, but Brockett acknowledges such a treatment shows both the potential and the current limitations of the technology.
But would-be high-school plagiarists beware. Yes, the technology could someday translate the whole Wikipedia article for you, but it would likely translate the article the same way for all your classmates as well. And plagiarism detection software is evolving along with the science of machine translation.
As for the thesaurus itself, the technology would be a natural fit for Word, which already has a built-in traditional thesaurus. But the technology could also help Microsoft in another key area: search.
That's because while search engines are good at finding things like names, that have just one form, they have a harder time finding expressions that can be phrased in multiple ways.
That's less of an issue when searching across the whole Web. For example, searching "Who shot Abraham Lincoln?" "Who killed Abraham Lincoln" and "Who assassinated Abraham Lincoln" all direct you to a page with John Wilkes Booth.
However, when it comes to searching smaller universes, such as a company's intranet, that might not be the case.
"You might not find it if the words are different," Brockett said. In such cases, automatically searching using similar phrases might boost the likelihood of finding a result.
During her years at CNET News, Ina Fried has changed beats several times, changed genders once, and covered both of the Pirates of Silicon Valley. These days, most of her attention is focused on Microsoft. E-mail Ina. 




Linux :- Unsecure
Reliable :- Windows
MS will make some small blunders in this and everyone will laugh and not use it
This guy probably had his team work for days, months or years in Microsoft labs thinking about a solution for this scientific/mathematical problem and all you think about is to bash Microsoft and not even critique the work based on the merits.
If you don't understand it, then just keep quiet and let the folks who know about the subject speak up.
Some employee of Google might jump in and say "that does not work... this is why..."
Whether this work is good, bad, or average all I ask is they leave it OFF by default.
Where does it say in the article that it's going to be part of Windows or any applications you use in the first place? What exactly do you want turned OFF? It's not even clear where the application of this technology is going to be.
So, you're saying Microsoft can't be trusted when you say you want Microsoft to turn-off something that you don't even know how it will affect the Microsoft application or OS that you are using.
We can ridicule a company or a technology, but it would be nice if we can offer our thoughts on why we are doing so. Otherwise, it would be just a unproductive bash.
But if you change:
"Who shot Pamela Anderson" (shot in the concept of ******* or in the concept of photography).
then
"Who killed Pamela Anderson" and "Who assassinated Pamela Anderson" means a different thing
You can find that it is not easy to switch synonym at whim because the new phrase can say a different thing.
LOL. ;-)
Sure, those fonts and paginations do make a document pretty. But, far from seeing anything to the contrary, I contemplate the now widespread use of "information technology" to write, blog, post, ad nauseam, and realize that I have been correct to stand my ground.
If it's anything approaching the dumb behemoth's offerings to date, I hope that someone checks the MS Wurd grammar for ... there is hope, is there not?
For I believe there is not. Not without the pen, that is.
- by TomKnorr February 25, 2009 1:21 PM PST
- It's about time that Microsoft is putting the pieces together, fashionably late as usual. We are talking ontology based machine translation here. I wonder if they find all the other gems that are in this technology.
- Like this Reply to this comment
-
(11 Comments)We have been working on machine generated textual descriptions from conceptual knowledge for several years now. No Wikipedia, all presented information is machine generated and presented in any language, from the concept knowledge, not a translated word list.
We are also talking of user interfaces that "know", that are conceptually aware of what the user has selected on the screen, that "know" what information is missing, that learn when a new fact is added to a concept category and just about start asking questions themselves. We are talking about user interfaces that can be conferenced with native speakers all over the world showing the same subject pages in their native language at the same time.
The question of "who shot PA" is obviously splitting into 2 (at least) alternate translations - one will not make sense in the remainder of the story. You cannot just translate a sentence and leave it stand alone. The MT will eventually have the knowledge of the whole story as a concept - the idea of the story -and will be able to rephrase the idea in the target language. This will not create you a "literal" translation in the sense of what a human translator would create but it will grasp the essence of the original and present it in the target language - and (our system) at some point will know more about the original's subject than the originator of any story to translate.
Welcome to the club, this is what the next generation internet is all about.