August 9, 2004 4:00 AM PDT

Next-generation search tools to refine results

Related Stories

Research: From lab to market

June 15, 2004

The unsung hard drive

May 27, 2004

IBM sets out to make sense of the Web

February 5, 2004

A life in bits and bytes

January 6, 2003
SAN JOSE, Calif.--The vast corpus of human knowledge could soon be published on the Internet. The problem now is how to wade through it.

Although search engines have greatly enhanced access to information, and storage technology has made it cheap to digitize nearly everything, search tools need to be refined to make it easier to digest information or conduct queries. That was the word from researchers and speakers at the New Paradigms for Using Computers Conference, held at IBM's Almaden research lab here last week.

News.context

What's new:
Scientists are working on next-generation search engines and tools so users will be able to pick through the data on their hard drives and the Web.

Bottom line:
The amount of digital information is exploding, and unless inventions bubble up, we could get lost in the morass.

More stories on this topic

"We live in a world with lots of information but also lots of interruptions. It is a teriyaki of information. The question is, 'How do we survive in the marinade?'" joked Dan Russell, senior manager of user sciences and experience research at IBM Almaden.

Early attempts to better locate the world's information are already under way. The University of California at Berkeley, for example, showed off at the conference a prototype of a search engine called Flamenco that makes it easier to search for works of art or antiques. Santa Clara, Calif.-based Inxight, meanwhile, has created software that attempts to graphically represent latent connections between people or institutions by studying where and how they get mentioned on the Web.

On the desktop, companies such as Ingenuity Software, founded by former Apple Computer developer Bruce Horn, are creating tools designed to make it easier for people to index their photos and documents for subsequent Google-like searches on their hard drive.

These research efforts are in addition to new operating systems under development that will include better search tools.

Microsoft plans to add better search features to a future version of Windows, code-named Longhorn, due sometime around 2006 or 2007. The software giant last week demonstrated a more general Web search "service" that's also in development.

And Apple's Tiger, a new version of the company's Mac OS X operating system that's due next year, will include a new systemwide search engine called Spotlight that will allow Mac users to quickly search and find any file, Apple says.

How many books?
One of the surprises that has emerged from the Internet Archive, which is intended to become a repository of everything ever published, is that the body of public works can probably be corralled, said Brewster Kahle, founder of the organization.

About 100 million different books have been published in history, Kahle said, citing estimates from professor Raj Reddy at Carnegie Mellon University. About 28 million sit in the Library of Congress. On average, a book can be condensed to a megabyte in Microsoft Word. Thus, the books in the Library of Congress could fit into a 28-terabyte storage system.

"For the cost of a house, you could have the Library of Congress," Reddy said, adding that mass book-scanning projects are currently under way in India and China.

"Universal access to all human knowledge is within our grasp. It could be one of the greatest achievements of all time."
-- Brewster Kahle, founder, Internet Archive
Only about 2 million to 3 million audio recordings--mostly music--have ever been published for public consumption. The Internet Archive has begun to store digitized recordings of concerts as well and has about 15,000 shows in its database to date. There are between 100,000 to 200,000 theatrical movies--half of them from India--in existence and about 20 terabytes of TV broadcasts a month. The Web grows by about 20 terabytes of compressed data a month as well. (One terabyte equals 1 trillion bytes.) Since 1984, about 50,000 software titles, including CD-ROMs, have emerged.

Though the legal issues around storing and viewing all this information remain thorny, storing it is doable.

"Universal access to all human knowledge is within our grasp," Kahle said. "It could be one of the greatest achievements of all time."

Still, that's a lot to grasp. Similarly, individuals will experience an explosion in their personal catalogs of data. In the MyLifeBits project under way at Microsoft Research, noted scientist Gordon Bell is attempting to digitally capture all of the books, movies, TV shows, music and other media he has experienced in his life. He's up to 44GB of data so far.

E-mails, phone messages, photographs and personal video will also add to an individual's data trove. In another experiment, doctors in Cambridge, England, have equipped patients suffering from severe memory loss with a Microsoft SenseCam, a wearable camera that takes pictures when a person moves. One man is currently using it so he can show his wife, who has memory problems, a diary of the day, said Ken Wood, who works on the project.

Microsoft has also entered a three-year alliance with the Edinburgh International Festival in Scotland. In a likely experiment, attendees will wander about the arts fest with SenseCams around their necks, snapping shots.

Hide and seek
One approach to mastering data overload lies in developing search engines specialized for certain topics and data sets. That's the tack taken by Berkeley's Flamenco project.

In Flamenco, a Yahoo-like interface categorizes artworks drawn from museum collections around the world by content (animals, heaven and earth, shapes and colors, and so on), century, artist, medium (such as painting, furniture, sculpture) and other identifiers. By going up and down the tree, users can browse through all the animal pictures found in the database, or they can zero in on, say, the years 1700 to 1709 and discover that the period, at least as represented by the database, produced only four paintings of hoofed mammals.

The search engine does not search on the visual information contained in the picture, said Kevil Li, a student on the project. Instead, searches are conducted on descriptive text submitted by the museums that digitize their artwork for such databases.

Other tools, such as Inxight and GeoFusion, produce graphical representations of data obtained through searches. GeoFusion, which makes software that can extrapolate from geographic data, was able to render a map of the movements of a tagged tuna.

By contrast, Inxight's software creates a map of relationships between names and topics. A search on the White House and business showed that Haliburton is the corporation linked most often to the White House. In a similar fashion, IBM's own WebFountain project is used to test how cohesive certain blogging communities are by how quickly and in unison they react to news events.

File systems will likely begin to disappear as search gains popularity. One of the phenomena that Microsoft researchers are finding in MyLifeBits is that files are largely ad hoc categories that become outdated, said Jim Gemmell at Microsoft Research.

Instead, data should be tagged so that if people remember a name or part of a name, they can find their way back to documents or pictures involving that person, or they can find documents created on the same day that they had a phone conversation with the person, even if the discussion involved something unrelated.

"The problem is not that we keep too much with MyLifeBits. The problem is how to use it," Gemmell said.

Poorer nations will also be able to take advantage of these advances, even without an electrical grid. The Internet Archive has created mobile bookmobiles in conjunction with Hewlett-Packard and others. The bookmobiles contain a printer hooked up to a satellite feed, which can print books for kids. Two are in operation in India, while another in rural Uganda prints about 1,500 books a week. The entire bookmobile, including the cost of the used van, is $15,000, and 100-page books cost about a $1 to print and bind in the van.

"It takes about 12 to 15 minutes to make a book," he said. "It is cheaper for a library in the United States to print and give away a book than retrieve it."

14 comments

Join the conversation!
Add your comment
This technology already exists- sort of
I found a utility called 'Locate' one day, and my life has been much better ever since! It works perfectly- each day, it runs in the background and indexes every file on my hard drive. Then, when I want to search (it searches across my network too), it returns results instantly. Here's a link for you- <a class="jive-link-external" href="http://www.uku.fi/~jmhuttun/english/softwares.shtml" target="_newWindow">http://www.uku.fi/~jmhuttun/english/softwares.shtml</a>.
Posted by Revo (4 comments )
Reply Link Flag
This technology already exists- sort of
I found a utility called 'Locate' one day, and my life has been much better ever since! It works perfectly- each day, it runs in the background and indexes every file on my hard drive. Then, when I want to search (it searches across my network too), it returns results instantly. Here's a link for you- <a class="jive-link-external" href="http://www.uku.fi/~jmhuttun/english/softwares.shtml" target="_newWindow">http://www.uku.fi/~jmhuttun/english/softwares.shtml</a>.
Posted by Revo (4 comments )
Reply Link Flag
Data mining for Mac users
If you're running Mac OS X you might want to check out
theConcept at www.mesadynamics.com. It says it can search
your desktop, web sites and search engines.
Posted by tobyp--2008 (19 comments )
Reply Link Flag
Data mining for Mac users
If you're running Mac OS X you might want to check out
theConcept at www.mesadynamics.com. It says it can search
your desktop, web sites and search engines.
Posted by tobyp--2008 (19 comments )
Reply Link Flag
Free Links to NEW SEARCH ENGINES
New search engines and tools are being introduced everyday, to find more of these tools and search engines see

<a class="jive-link-external" href="http://www.searchenginesinternational.com" target="_newWindow">http://www.searchenginesinternational.com</a>
Posted by anthonycea (103 comments )
Reply Link Flag
Free Links to NEW SEARCH ENGINES
New search engines and tools are being introduced everyday, to find more of these tools and search engines see

<a class="jive-link-external" href="http://www.searchenginesinternational.com" target="_newWindow">http://www.searchenginesinternational.com</a>
Posted by anthonycea (103 comments )
Reply Link Flag
This technology already exists - and it searches fuzzy
Since a while I am using a fuzzy-search tool called 'SERglobalBrain Personal Edition'. It's my favorite for finding textual information within my file- and outlookitems. The advantage against all other tools is its ability to search fuzzy. So I do not need to enter exact queries.

Can be found via

<a class="jive-link-external" href="http://www.ser.com/product_showcase/globalbrain/index.asp" target="_newWindow">http://www.ser.com/product_showcase/globalbrain/index.asp</a>
Posted by (2 comments )
Reply Link Flag
This technology already exists - and it searches fuzzy
Since a while I am using a fuzzy-search tool called 'SERglobalBrain Personal Edition'. It's my favorite for finding textual information within my file- and outlookitems. The advantage against all other tools is its ability to search fuzzy. So I do not need to enter exact queries.

Can be found via

<a class="jive-link-external" href="http://www.ser.com/product_showcase/globalbrain/index.asp" target="_newWindow">http://www.ser.com/product_showcase/globalbrain/index.asp</a>
Posted by (2 comments )
Reply Link Flag
vague discomfort with some issues
The article is interesting in many ways, but I find a few things that are problematic, both from the article and from some of the developments discussed. For example, the first paragraph includes a statement that search technology has made it cheap to digitize nearly everything. What about labor costs? Surely storage is now more affordable and doable, but the actual costs of the digitization process are still exhobirtant aren't they?

The MyLifeBits porject is interesting, but myopic. Someone is attempting to digitally capture all of the books, movies, etc. and other media he has experienced in his life. What about other media that includes the environment? Conversations? Textures, temperatures, flavors? There's just something vaguely worrisome about this emphasis on our experiences that omits such a huge realm of our existence.

I'm a librarian, so the news about the Internet Archive's bookmobile is really good news. Interesting, too, as we (libraries) become publish-on-demand sites for more and more of our patrons.
Posted by (2 comments )
Reply Link Flag
vague discomfort with some issues
The article is interesting in many ways, but I find a few things that are problematic, both from the article and from some of the developments discussed. For example, the first paragraph includes a statement that search technology has made it cheap to digitize nearly everything. What about labor costs? Surely storage is now more affordable and doable, but the actual costs of the digitization process are still exhobirtant aren't they?

The MyLifeBits porject is interesting, but myopic. Someone is attempting to digitally capture all of the books, movies, etc. and other media he has experienced in his life. What about other media that includes the environment? Conversations? Textures, temperatures, flavors? There's just something vaguely worrisome about this emphasis on our experiences that omits such a huge realm of our existence.

I'm a librarian, so the news about the Internet Archive's bookmobile is really good news. Interesting, too, as we (libraries) become publish-on-demand sites for more and more of our patrons.
Posted by (2 comments )
Reply Link Flag
New search engines, data storage and retrieval
We're getting close to universality. We still have to solve the access problem and language dissonance. Is whatt I said what I meant, and did your hear what I said, or intuit what I meant?
But, for researchers and knowledge afficianados, the "holy grail" of anything, anytime, anywhere is getting closer.
I might pass an old looking building in San Antonio, type in the address, and get the building's history. Or, search "terrorism," get a menu of ALL the information categories available, History, Origin, Current status, and much more. Now, that's useability.
A person's brain (our mental "search engine" )inately uses the "drill down" technique of accessing memory until it reaches a "satisfactory " response. A person can broaden their internal search by adding associated and related memories.
My first kiss = Kathy = appearance=beautiful= red hair= blue eyes=time of day=nightime =under stars=warm night,=,= ,=, all of this fleshing out the memory. Unfortunately, mental searching is analog, not gesthalt in operation. So in retrieving a memory, our brain operates sequentially in the sense of "if there, then that."
Wouldn't it be great if search engine evolution provided a gesthalt experience, combining sight, sound, context, dimension and texture?
Posted by bdennis410 (175 comments )
Reply Link Flag
New search engines, data storage and retrieval
We're getting close to universality. We still have to solve the access problem and language dissonance. Is whatt I said what I meant, and did your hear what I said, or intuit what I meant?
But, for researchers and knowledge afficianados, the "holy grail" of anything, anytime, anywhere is getting closer.
I might pass an old looking building in San Antonio, type in the address, and get the building's history. Or, search "terrorism," get a menu of ALL the information categories available, History, Origin, Current status, and much more. Now, that's useability.
A person's brain (our mental "search engine" )inately uses the "drill down" technique of accessing memory until it reaches a "satisfactory " response. A person can broaden their internal search by adding associated and related memories.
My first kiss = Kathy = appearance=beautiful= red hair= blue eyes=time of day=nightime =under stars=warm night,=,= ,=, all of this fleshing out the memory. Unfortunately, mental searching is analog, not gesthalt in operation. So in retrieving a memory, our brain operates sequentially in the sense of "if there, then that."
Wouldn't it be great if search engine evolution provided a gesthalt experience, combining sight, sound, context, dimension and texture?
Posted by bdennis410 (175 comments )
Reply Link Flag
New search engines, data storage and retrieval
We're getting close to universality. We still have to solve the access problem and language dissonance. Is what I said what I meant, and did you hear what I said, or intuit what I meant?
But, for researchers and knowledge afficianados, the "holy grail" of anything, anytime, anywhere is getting closer.
I might pass an old looking building in San Antonio, type in the address, and get the building's history. Or, search "terrorism," get a menu of ALL the information categories available, History, Origin, Current status, and much more. Now, that's useability.
A person's brain (our mental "search engine" )innately uses the "drill down" technique of accessing memory until it reaches a "satisfactory " response. A person can broaden their internal search by adding associated and related memories.
My first kiss = Kathy = appearance=beautiful= red hair= blue eyes=time of day=nightime =under stars=warm night,=,= ,=, all of this fleshing out the memory. Unfortunately, mental searching is analog, not gesthalt in operation. So in retrieving a memory, our brain operates sequentially in the sense of "if there, then that."
Wouldn't it be great if search engine evolution provided a gesthalt experience, combining sight, sound, context, dimension and texture, maybe even feelings?
Posted by bdennis410 (175 comments )
Reply Link Flag
New search engines, data storage and retrieval
We're getting close to universality. We still have to solve the access problem and language dissonance. Is what I said what I meant, and did you hear what I said, or intuit what I meant?
But, for researchers and knowledge afficianados, the "holy grail" of anything, anytime, anywhere is getting closer.
I might pass an old looking building in San Antonio, type in the address, and get the building's history. Or, search "terrorism," get a menu of ALL the information categories available, History, Origin, Current status, and much more. Now, that's useability.
A person's brain (our mental "search engine" )innately uses the "drill down" technique of accessing memory until it reaches a "satisfactory " response. A person can broaden their internal search by adding associated and related memories.
My first kiss = Kathy = appearance=beautiful= red hair= blue eyes=time of day=nightime =under stars=warm night,=,= ,=, all of this fleshing out the memory. Unfortunately, mental searching is analog, not gesthalt in operation. So in retrieving a memory, our brain operates sequentially in the sense of "if there, then that."
Wouldn't it be great if search engine evolution provided a gesthalt experience, combining sight, sound, context, dimension and texture, maybe even feelings?
Posted by bdennis410 (175 comments )
Reply Link Flag
 

Join the conversation

Add your comment

The posting of advertisements, profanity, or personal attacks is prohibited. Click here to review our Terms of Use.

What's Hot

Discussions

Shared

RSS Feeds

Add headlines from CNET News to your homepage or feedreader.