- Related Stories
-
Microsoft ups ante in desktop search
May 16, 2005 -
IBM expands corporate search ambitions
February 24, 2005 -
Can IBM be a Google for businesses?
December 13, 2004 -
IBM spices up corporate search
September 29, 2004
The company was set to release on Monday a new version of its WebSphere Information Integration OmniFind Edition corporate information management tool. It integrates technology called Unstructured Information Management Architecture (UIMA) that IBM designed to improve the processing of text within documents and other unstructured content sources to help find relationships and meaning beyond just keywords.
IBM, a longtime supporter of the open-source movement in which developers freely write and modify software and share code, also is presenting UIMA to the Open Source Technology Group, a network of online technology resources. The updated software tool is available from IBM now and is expected to be available through the SourceForge developers Web site by the end of the year.
"IBM has been investing in a huge initiative since 2001 in information integration to help companies integrate and find any information that exists across the enterprise," said Nelson Mattos, IBM's vice president of Information Integration.
"That's the number one problem in the enterprise world," he said, adding that studies show that workers spend on average 30 percent of their time looking for relevant information. The problem is exacerbated by the fact that about 85 percent of corporate data is unstructured and thus not easy to find, Mattos said.
More than 15 companies already have said they plan to support UIMA as a framework for search and text analysis of unstructured data, IBM said.
Projects currently using IBM's WebSphere Information Integration OmniFind include a quality-control early-warning system for the automotive industry to process warranty claims, repair requests and call-center logs that can help identify problems, and an advanced intelligence system for antiterrorism and law enforcement.
"There are lots of different ways to skin a cat when it comes to analyzing unstructured text, but all those ways only give you a sneak peak at what you might get," said Dana Gardner, an analyst at Interarbor Solutions. By using UIMA, companies get a more comprehensive extraction of the information they seek, he said.
"It probably will take some time for the various commercial products to put this software developers' kit to use and allow for their products to take part in the interoperability process," Gardner added.
See more CNET content tagged:
Unstructured Information Management Architecture, Dana Gardner, information integration, IBM Corp., IBM WebSphere




So it had to do some thing. It got into server business. And now into software business. Analyzing emails or other data. What is the big deal with it. Like any data is contained in E-mail whatsover. Most emails get deleted wiythout being opened. Only a week ago (on August 2) IBM rival Microsoft got a spammer to fork over more than $7 Million to it for persistent spamming. Now comes the word that IBM was analysing those E-mails. What a way to make a business.
IBM can only regain its lead by financing a server based method of surfing the net so that the server is no longer required to send any data to the clients. Instead it only recieves the data from the clients and processes it. When this is done there would be no need for Personal Computers. The cellphones could be used to surf the web as described at
http://www.newerawisp.blogspot.com
What IBM needs is a drive. The drive to be number 1 again.
So it had to do some thing. It got into server business. And now into software business. Analyzing emails or other data. What is the big deal with it. Like any data is contained in E-mail whatsover. Most emails get deleted wiythout being opened. Only a week ago (on August 2) IBM rival Microsoft got a spammer to fork over more than $7 Million to it for persistent spamming. Now comes the word that IBM was analysing those E-mails. What a way to make a business.
IBM can only regain its lead by financing a server based method of surfing the net so that the server is no longer required to send any data to the clients. Instead it only recieves the data from the clients and processes it. When this is done there would be no need for Personal Computers. The cellphones could be used to surf the web as described at
http://www.newerawisp.blogspot.com
What IBM needs is a drive. The drive to be number 1 again.
InfoCodex already does all this today with the help of a linguistical database and synonym and/or similarity search across 5 languages (German, French, Italian, English and Spanish). With InfoCodex you can search for a block of text in one language and it will find you all the similar documents in the other languages as well. All of this is done without one single minute of training - because of the linguistical database that contains 2.9 Mio words and terms (i.e. "European Court of Justice" or "The President of the United States" are terms and reconized as such).
See the following links:
http://www.ywesee.com/pmwiki.php/Ywesee/InfoCodexProcedure
http://www.ywesee.com/uploads/Ywesee/archimag-e.pdf
http://www.ywesee.com/uploads/Ywesee/Evaluationsentscheid-e.pdf
http://www.ywesee.com/uploads/Main/USP_e.pdf
- Nice thoughts, but already implemented in InfoCodex.
- by zdavatz August 9, 2006 2:34 AM PDT
- Thanks for the interesting article. Once again IBM is giving us a great vision about the future and how unstructured information can be searched.
- Reply to this comment
-
(4 Comments)InfoCodex already does all this today with the help of a linguistical database and synonym and/or similarity search across 5 languages (German, French, Italian, English and Spanish). With InfoCodex you can search for a block of text in one language and it will find you all the similar documents in the other languages as well. All of this is done without one single minute of training - because of the linguistical database that contains 2.9 Mio words and terms (i.e. "European Court of Justice" or "The President of the United States" are terms and reconized as such).
See the following links:
http://www.ywesee.com/pmwiki.php/Ywesee/InfoCodexProcedure
http://www.ywesee.com/uploads/Ywesee/archimag-e.pdf
http://www.ywesee.com/uploads/Ywesee/Evaluationsentscheid-e.pdf
http://www.ywesee.com/uploads/Main/USP_e.pdf