MOUNTAIN VIEW, Calif.--Google's first search engine let people search by typing text onto a Web page. Next came queries spoken over the phone. On Monday, Google announced the ability to perform an Internet search by submitting a photograph.
The experimental search-by-sight feature, called Google Goggles, has a database of billions of images that informs its analysis of what's been uploaded, said Vic Gundotra, Google's vice president of engineering. It can recognize books, album covers, artwork, landmarks, places, logos, and more.
"It is our goal to be able to identify any image," he said. "It represents our earliest efforts in the field of computer vision. You can take a picture of an item, use that picture of whatever you take as the query."
However, the feature is still in Google Labs to deal with the "nascent nature of computer vision" and with the service's present shortcomings. "Google Goggles works well on certain types of objects in certain categories," he said.
Google Goggles was one of the big announcements at an event at the Computer History Museum here to tout the future of Google search. The company also showed off real-time search results and translation of a spoken phrase from English to Spanish using a mobile phone.
"It could be we are really at the cusp of an entirely new computing era," Gundotra said, with "devices that can understand our own speech, help us understand others, and augment our own sight by helping us see further."
Offering one real-world example of the service in action, Gundotra said that when a guest came by for dinner, he snapped a photo of a wine bottle she gave him to assess its merits. The result--"hints of apricot and hibiscus blossom"--went far beyond his expertise, but that didn't stop him from sharing the opinion over dinner.
He also demonstrated Google Goggles to take a photo of the Itsukushima Shrine in Japan, a landmark tourists may recognize even if they can't read Japanese. The uploaded photo returned a description of the shrine on his mobile phone.
Although the service can recognize faces, since faces are among the billions of images in the database, it doesn't right now, Gundotra said.
"For this product, we made the decision not to do facial recognition," Gundotra said. "We still want to work on the issues of user opt-in and control. We have the technology to do the underlying face recognition, but we decided to delay that until safeguards are in place."
Google's search is a near-constant work in progress as the company strives to grow beyond supplying search results in the form of 10 hyperlinks to various Web pages.
"It's not just about 10 blue links," said Marissa Mayer, Google's vice president of search and user experience. "It's about the best answers."
"In the past 67 days, we launched 33 different search innovations," she boasted. "That's one innovation every two days."
Three more in the pipeline came to light on Monday. First, the mobile version of Google's search service to suggest completions to search queries now is geographically smart. That means, for example, a person in Boston typing "re" in a search box will see "Red Sox" as a suggested completion but a person in San Francisco will see "REI."
Second, a "near me now" service due to launch in coming weeks can tell users of iPhones and Android devices what's near them at a particular moment. Third, location supplied by the mobile phone can adjust product search results to show nearby stores that have a particular item in stock.
Google isn't afraid of raising expectations of the service to the sci-fi level, where concepts such as augmented reality--an overlay of computer data that supplements what people see in the real world--have flourished for years.
Eventually, Google wants a system that lets people point to an object and retrieve information on it, Gundotra said--turning a person's finger into a real-world mouse pointer. "Today marks the beginning of that visual search journey," Gundotra said.
Google's system, like its Picasa face recognition software for photo management and face blurring in Google Maps' Street View, employs technology stemming from Google's 2006 acquisition of Neven Vision, a start-up focusing on face and object recognition. Founder Hartmut Neven, still a Google employee, was at Monday's event.
Neven expressed pride for one aspect of the system: the fact that much of its background work happens with no human interaction through a process he called "unsupervised learning."
"The algorithms build models for visual recognition are unsupervised," Neven said. "Based on the photos we find, models--for example, the Empire State building--will emerge."
Speaking of science fiction, Google also showed off technology that could turn mobile phones into a computerized translation system. It wasn't quite the babelfish of "The Hitchhiker's Guide to the Galaxy," but it did translate Gundotra's question about where the nearest hospital is located into Spanish.
The technology works using a new communications conduit to Google servers. The raw utterance recorded by the mobile phone is sent to Google's servers, which first interpret it as English. It's then translated into Spanish, and the text is sent back to the mobile phone. A text-to-speech synthesizer on the phone--for the demonstration, a Droid model running Google's Android operating system--reads out the Spanish.
The service is set to launch in the first quarter of 2010, Gundotra said.
Google already offers the ability to search by voice--notably with applications for the iPhone and Android phones that today work in English and Mandarin Chinese.
Gundotra said Japanese now has joined the other options for the applications, and that more will come. "In 2010, you will see us dramatically expand our efforts and support more languages," he said.
Language is key to Google's mission and operations, and the company touted its progress in the area. Mayer said Google now can translate words from any of 51 languages into any other. In 2008, Chief Executive Eric Schmidt said the company expects to increase that to 100 languages.
"We are working to break down the language barrier," Mayer said. "That focus is what unlocks the Web."