• On TechRepublic: 10 cool USB flash drive tricks
February 20, 2009 4:00 AM PST

Microsoft aims to build a better thesaurus

by Ina Fried
  • Font size
  • Print
  • 11 comments

A team of researchers at Microsoft is looking to beat Roget at his own game.

Aiming to build a better thesaurus, the Writing Assistance project within Microsoft's research unit is tapping techniques developed to translate from one language to another.

Although thesauri are good at finding lots and lots of synonyms, they require the user to pick the right one because they aren't very good at understanding the context of what is being said. That's where the experience from doing machine translations comes in.

Brockett

Brockett

(Credit: Microsoft )

"We've taken the actual translation tables...and what we've done is we've taken those and said if a word in Chinese maps to two different English words maybe those two words are synonyms with some probability," said Christopher Brockett, a computational linguist and one of the Microsoft researchers leading the project.

The approach has two key benefits over a static thesaurus. First of all, the newer approach can do phrases, as opposed to single words. Also, it can draw on the context in which the phrase is used.

Brockett plans to show off a prototype of the tool next week at TechFest, Microsoft's annual internal science fair. It's just one of dozens of projects that will be shown as part of an effort to expose Microsoft's business units to the work being done in Microsoft's research labs. (Check back next week for CNET's on-the-ground coverage of the event, which kicks off Monday night at Microsoft's campus in Redmond, Wash.)

TechFest is sort of like "The Dating Game" for Microsoft's research and product development arms. Research teams at Microsoft set up booths, somewhat like a high-school science fair, while product teams shuffle through looking for something that might give their efforts a leg up on the competition.

For the public, TechFest can also offer a glimpse at future product directions. For example, researcher Andy Wilson showed off a number of surface computing projects in the years leading up to the debut of Microsoft's Surface product.

As is the case with most of the projects, the thesaurus effort is still in its infancy.

"We're still working on the algorithms and how much work we give to the language pairs," Brockett said. "We have to get the quality up. There are usability issues that have to be looked into."

Over time, though, Brockett hopes the technique could be used to effectively translate whole sentences. Microsoft has a demonstration of that up on its Web site, but Brockett acknowledges such a treatment shows both the potential and the current limitations of the technology.

But would-be high-school plagiarists beware. Yes, the technology could someday translate the whole Wikipedia article for you, but it would likely translate the article the same way for all your classmates as well. And plagiarism detection software is evolving along with the science of machine translation.

As for the thesaurus itself, the technology would be a natural fit for Word, which already has a built-in traditional thesaurus. But the technology could also help Microsoft in another key area: search.

That's because while search engines are good at finding things like names, that have just one form, they have a harder time finding expressions that can be phrased in multiple ways.

That's less of an issue when searching across the whole Web. For example, searching "Who shot Abraham Lincoln?" "Who killed Abraham Lincoln" and "Who assassinated Abraham Lincoln" all direct you to a page with John Wilkes Booth.

However, when it comes to searching smaller universes, such as a company's intranet, that might not be the case.

"You might not find it if the words are different," Brockett said. In such cases, automatically searching using similar phrases might boost the likelihood of finding a result.

See the rest of our coverage from TechFest 2009 here.

During her years at CNET News, Ina Fried has changed beats several times, changed genders once, and covered both of the Pirates of Silicon Valley. These days, most of her attention is focused on Microsoft. E-mail Ina.
Recent posts from Beyond Binary
Olympic snow still in short supply at Cypress
Microsoft denies Windows 7 battery problem
Security software maker Vitamin D exits beta
Olympics and tech: 'No room to fail' (Q&A)
Microsoft aims for smooth streaming in Vancouver
Olympics to athletes: Go ahead and tweet
Facebook takes over its display ads from Microsoft
Microsoft ending Xbox Live support for older games
Add a Comment (Log in or register) (11 Comments)
  • prev
  • next
by aMUSICsite February 20, 2009 4:15 AM PST
Apple Computers :- Microsoft PC<br />Linux :- Unsecure<br />Reliable :- Windows<br /><br />MS will make some small blunders in this and everyone will laugh and not use it
Reply to this comment
by scdecade February 20, 2009 7:01 AM PST
Wonderful news!! Now Microsoft, please, please, please leave it turned OFF by default. Thank you.
Reply to this comment
by eadeguzman February 20, 2009 7:14 AM PST
aMUSICsite+scdecade. I know, I know... Microsoft is evil... they can't be trusted... blah, blah, blah... Aren't you guys tired of it yet?<br /><br />This guy probably had his team work for days, months or years in Microsoft labs thinking about a solution for this scientific/mathematical problem and all you think about is to bash Microsoft and not even critique the work based on the merits.<br /><br />If you don't understand it, then just keep quiet and let the folks who know about the subject speak up.<br /><br />Some employee of Google might jump in and say "that does not work... this is why..."
Reply to this comment
by scdecade February 20, 2009 7:55 AM PST
Blah, blah, blah yourself. Where did I say MS is evil? Where did I say they can't be trusted? Aren't you tired of finding conspiracies where there are none? I use MS server products everyday and I find then to be excellent.<br /><br />Whether this work is good, bad, or average all I ask is they leave it OFF by default.
by eadeguzman February 20, 2009 10:56 AM PST
scdecade -- good back-pedaling. You know what I mean and you know what you mean.<br /><br />Where does it say in the article that it's going to be part of Windows or any applications you use in the first place? What exactly do you want turned OFF? It's not even clear where the application of this technology is going to be. <br /><br />So, you're saying Microsoft can't be trusted when you say you want Microsoft to turn-off something that you don't even know how it will affect the Microsoft application or OS that you are using.<br /><br />We can ridicule a company or a technology, but it would be nice if we can offer our thoughts on why we are doing so. Otherwise, it would be just a unproductive bash.
by Magallanes February 20, 2009 9:39 AM PST
"Who shot Abraham Lincoln?" "Who killed Abraham Lincoln" and "Who assassinated Abraham Lincoln" <br /><br />But if you change:<br />"Who shot Pamela Anderson" (shot in the concept of ******* or in the concept of photography).<br />then<br />"Who killed Pamela Anderson" and "Who assassinated Pamela Anderson" means a different thing <br />You can find that it is not easy to switch synonym at whim because the new phrase can say a different thing.
Reply to this comment
by Dalkorian February 20, 2009 12:14 PM PST
A real test of an M$ thesaurus would be in how many synonyms it returns for the expression "blue screen of death".<br /><br />LOL. ;-)
Reply to this comment
by eadeguzman February 20, 2009 12:41 PM PST
Good one. :-)
by DrtyDogg February 24, 2009 3:15 PM PST
BSOD, Spinning Beach Ball, Kernel Panic.
by twangle February 23, 2009 8:39 PM PST
I always thought that it would never be wise to rely upon writing software to make a better job of writing than the mind of a reasonably literate human being.<br />Sure, those fonts and paginations do make a document pretty. But, far from seeing anything to the contrary, I contemplate the now widespread use of "information technology" to write, blog, post, ad nauseam, and realize that I have been correct to stand my ground.<br /><br />If it's anything approaching the dumb behemoth's offerings to date, I hope that someone checks the MS Wurd grammar for ... there is hope, is there not?<br />For I believe there is not. Not without the pen, that is.
Reply to this comment
by TomKnorr February 25, 2009 1:21 PM PST
It's about time that Microsoft is putting the pieces together, fashionably late as usual. We are talking ontology based machine translation here. I wonder if they find all the other gems that are in this technology. <br />We have been working on machine generated textual descriptions from conceptual knowledge for several years now. No Wikipedia, all presented information is machine generated and presented in any language, from the concept knowledge, not a translated word list. <br />We are also talking of user interfaces that "know", that are conceptually aware of what the user has selected on the screen, that "know" what information is missing, that learn when a new fact is added to a concept category and just about start asking questions themselves. We are talking about user interfaces that can be conferenced with native speakers all over the world showing the same subject pages in their native language at the same time. <br /> <br />The question of "who shot PA" is obviously splitting into 2 (at least) alternate translations - one will not make sense in the remainder of the story. You cannot just translate a sentence and leave it stand alone. The MT will eventually have the knowledge of the whole story as a concept - the idea of the story -and will be able to rephrase the idea in the target language. This will not create you a "literal" translation in the sense of what a human translator would create but it will grasp the essence of the original and present it in the target language - and (our system) at some point will know more about the original's subject than the originator of any story to translate. <br />Welcome to the club, this is what the next generation internet is all about.
Reply to this comment
(11 Comments)
  • prev
  • next
advertisement
Click Here

Google's social side aims for some Buzz

Facebook and Twitter are the darlings of the social-media world, not Google--which hopes to change that with Buzz, betting it can organize your online social life.

Watching the birth of a gaming start-up

Stewart Butterfield and his friends are back at it with a new company. CNET's Daniel Terdiman was given exclusive, behind-the-scenes access as they built it from scratch.

About Beyond Binary

During her years at CNET, Ina Fried has changed beats several times, changed genders once, and covered both of the Pirates of Silicon Valley. These days, most of her attention is focused on Microsoft.

Beyond Binary is a look at how technology is changing our lives and the people behind all that life-changing stuff, with an extra emphasis on that which emanates from Redmond, Wash.

Add this feed to your online news reader

Beyond Binary topics

Binary Bits

    Follow Ina on Twitter (Twitter name: InaFried)
    advertisement
    advertisement

    Inside CNET News

    Scroll Left Scroll Right