August 26, 2005 4:00 AM PDT
Search specialist stakes its claim on names
Among other reasons: Because Raouf Gadi, Elraouf Djeddi and Abdulrauf Aljadai might be the same person--and a regular search engine wouldn't reflect that.
Language Analysis Systems, or LAS, of Herndon, Va., has devised a series of tools for solving one of the thornier, but often overlooked, problems in search: finding data on a particular individual in a multicultural, error-prone world. The company's software takes into account alternative spellings, cultural nuances and other linguistic issues as part of an attempt to return the most relevant information for a search query, rather than a laundry list of close matches.
The company's tools are mostly sold to law enforcement, intelligence and border agencies, but financial institutions and other businesses--hoping to ferret out fraud or merely improve their customer databases--have begun to adopt the technology too, said Jack Hermansen, chief executive and co-founder of LAS.
Since the terror attacks of Sept. 11, 2001, security and intelligence agencies have been hunting vigorously for technology that will help them gather information on potential terrorists. Pixlogic, for instance, has developed software meant to spot anomalies or suspicious individuals in videotape from security cameras. Language Weaver, meanwhile, has come up with an Arabic-English real-time translation tool.
"The penalties for missing a name are enormous," Hermansen said. "Someone could die."
Contrary to what one might think, names aren't great search terms. Handles like "Bob Johnson" or "Ted Smith" are broad and pull up thousands of false positives. Even if the searcher types the name correctly, a typo in a document being sought or the use of a nickname could mean that the results omit needed information, or that a crucial link won't pop up on the first few screens of search results.
Cultural and linguistic differences compound the problem, Hermansen said. Someone called "Paul Ho" in the United States could easily be known as "Ho Wan Lee" in Hong Kong, and different documents may show his name appearing variously in Roman and Chinese characters.
Often, U.S. companies and agencies also mangle their data. One of the most common mistakes derives from assuming that the middle name in a three-part name, such as "Maria Sanchez de Rodriguez," is a middle name.
"There seems to be an ethnocentric naivete in the U.S.," Hermansen said. "Trying to put a six-part Arabic name into a first-middle-last-name construction is going to raise havoc."
The subjects of these searches, of course, often try to avoid detection. Many years ago, although he was on a watch list, Mir Aimal Kansi got through customs and entered the United States using a common variation of his Urdu name--Kasi. Later, he killed two CIA operatives in 1993. (He was subsequently captured, convicted and executed.)
Some of the efforts by officials to correct these problems are comical, Hermansen said. One suggestion from the 9/11 Commission was that the U.S. government standardize spellings of names such as "Mohammed."
"You're going to tell naturalized citizens that they have to spell 'Mohammed' the same way," said Hermansen, who has lobbied against the directive. "Everyone would like it to be simple, but it isn't going to go away."
The software packages offered by LAS vary by function, and many customers deploy a combination of applications. The software is complemented by a database of 850 million names compiled and analyzed by LAS during the last 20 years. The company buys lists of names from information clearinghouses; to protect privacy, no personal information is collected, and first names are separated from last names before being delivered to LAS.
One tool, NameVariationGenerator, generates a list of common variations of a name--"Akbar," for instance, can be represented 20 different ways--and then searches for these variants.
Page 1 | 2
3 commentsJoin the conversation! Add your comment