Ask.com's Tomasz Imielinski discusses semantic search as Microsoft's Scott Prevost and Google's Peter Norvig look on.
(Credit: Tom Krazit/CNET News)SAN JOSE, Calif.--If those chasing Google have anything to say about it, search on the Internet is going to become more about a conversation than an exchange of keywords.
Panelists from the four major search engines--Google, Yahoo, Bing, and Ask.com--joined Web search start-ups TrueKnowledge and Hakia at the Semantic Technology Conference to discuss the rise of semantic technology as the engine behind the still nascent Internet search industry. Semantic search, or the idea of divining a user's true intent from how they enter their queries and how Web data is structured, is an unfamiliar concept to the majority of Web surfers who tend to think Internet search is actually pretty good as it is.
It's not, according to Tomasz Imielinski, executive vice president, global search and answers at Ask.com. "Most users don't know how good search can be," he said, drawing an analogy to those who were satisfied with their portable music options until the iPod came along.
The W3C is devoting an entire week to the concept of semantic technology, which involves Web publishers and search engines working together to structure data in a way that can be presented in a more appealing way than the "ten blue links"--a dirty term in the search industry these days--with which most searchers have grown familiar.
Yahoo has been banging this drum for a few years, introducing products like Search Monkey to help Web publishers start organizing their content around semantic standards, said Andrew Tompkins, chief scientist at Yahoo Search. "Today on any major search engine, you'll see structured information about a restaurant," he said, basic things like phone numbers, address, or maybe a link to a map of its location. All of those things require agreement on standards to make it happen.
But semantic search is also about improving the ability of search engines to analyze the meaning of plain text on a page, said Scott Prevost, general manager and director of product at Microsoft's Powerset division. A search engine that knows how to take a query and produce exactly what a person is looking for on the first page of results will prove attractive over time, he said.
The goal of all this work is to make search more intuitive, more like asking a friend or colleague a question, said Riza Berkan, CEO of semantic start-up Hakia. "We believe search is going to move to more conversational techniques," he said.
That's music to Ask.com's ears, of course. The company announced Wednesday that it now has 300 million question and answer pairs in its database that Imielinkski thinks provide context around searches.
But none of this work on semantic technology has done anything to dislodge Google from its position atop the search world, which actually grew a bit stronger over the past month according to ComScore. Google's Peter Norvig acknowledged the benefits of semantic technology and agreed that Yahoo deserves credit for pushing semantic technology along. He drew applause from the several hundred attendees at the panel discussion when he discussed Google's decision to support RDFa semantic standards, announced last month at Searchology.
Still, there's an economic component to this debate that Google isn't quite buying. None of the panelists brought this up Wednesday, but last year Microsoft's Prevost admitted that the desire to make an end-run around Google's dominance of keyword-based search advertising is what has driven semantic technology research, at least to a certain degree. "If people aren't bidding on keywords, and are bidding on concepts, it could completely change the ball game," he said last August at the Search Engine Strategies conference.
To that end, Norvig argued Wednesday that the idea of conversational search is good for people who aren't quite sure what they are looking for, or who don't quite understand a certain topic. But those who do grasp a topic and want a fast answer are much more likely to use keyword searches, he said.
Corrected at 3:49 p.m.: This post originally misstated the title of Ask.com's Tomasz Imielinski. He is executive vice president, global search and answers. Corrected on Friday, 11:35 a.m., clarifying the W3C did not sponsor the conference.
In March, Radar Networks launched Twine, an application that organizes information and connects people, places, companies, products, Web pages, videos, and photos. Along with Metaweb's Freebase, Powerset (sold to Microsoft), Hakia, Reuters' Calias, AdaptiveBlue and a few other start-ups, Radar Networks is trying to crack the code on building a piece of the semantic Web.
In a Times Online article, Web creator Tim Berners-Lee gave an example of how the semantic Web would work:
"Imagine if two completely separate things--your bank statements and your calendar--spoke the same language and could share information with one another. You could drag one on top of the other and a whole bunch of dots would appear showing you when you spent your money."
Twine won't provide that futuristic capability but it attempts to build a "semantic graph" of relationships between content, tags, people and Twines (the collection of items of an individual or group on the service). Each piece of content is a "semantic object," Radar Networks CEO Nova Spivack said, using Twine's underlying ontology and database, which applies semantic technologies such as RDF for storing data.
Spivack told me that public Twines are now visible to visitors to the site and to search engines. So far in the beta phase nearly 15,000 Twines have been created and 354,000 pieces of user-contributed content have been added into the system. More than 50,000 users signed up (34,000 are active) for the service, spending 13 to 15 minutes per session on the site, he said.
A major new release of the Twine platform is slated for release in the fall to address shortcomings and introduce new features. "We have worked on a lot of simplification, reducing the clutter, and we still need to reduce more. Twine has a lot of powerful features nobody uses, so we are moving some of the advanced features out of the way," Spivack said. "The fall release will bring more intelligence and semantics to the surface. For example, we will let anyone define a new type of thing, such as a recipe or baseball team form, to author. It's more like what Freebase does, and we will also likely integrate with Freebase over time."
In addition, performance improvements and algorithms to improve search as well as mining and crawling content are in the works. "A major focus of our work is on personalization and recommendations," Spivack said. "Ultimately, Twine is about 'interest networking' and is a content distribution network. People declare their interests, add content, join Twines and connect with people. As users work with the system it learns about their interests, using artificial intelligence and semantic Web technologies to provide more relevance. We are not attempting to index the whole Web, just the best stuff of interest to users. Ninety-nine percent of what's on the Web is not interesting to a user, so it's more about high signal to noise."
On the business front, Spivack believes that Twine can be an intermediary for users, delivering more targeted marketing messages in addition to content. It's similar to the way Facebook is creating a new kind of environment for advertising based on knowing member interests and their social or semantic graph. "The goal for Twine is to be the place on the Web that best understands your interests and represents them to others. The key is to give users control and privacy," Spivack said.
Twine is a work in progress. It's ambitious and has the potential to demonstrate how a more semantic Web could benefit users. The biggest challenge will be scaling the back-end infrastructure and attracting users, which means Twine will have to become far more easy to configure and use. We'll see in the coming months whether the forthcoming changes to Twine help open the floodgates.
Updated numbers on users and usage, 6:30 AM PST, August 1
As expected (see previous reports), Microsoft scooped up Powerset to buttress its search efforts.
Barney Pell, Powerset co-founder and CTO
(Credit: Dan Farber)It's not a replacement for increasing market share by acquiring Yahoo Search, but it gives Microsoft some differentiated search technology and top engineers for less than $100 million. Ramez Naam, group program manager of Live Search, said the Powersoft negotiations happened in parallel with the Yahoo talks over the last few months. Google and Yahoo may also have been interested in Powerset, but no one is talking.
Whether Microsoft can leapfrog Google over the long term with this semantic engine remains to be seen.
Powerset had done a good job of creating a rich semantic layer on top of Wikipedia, but bringing natural language and slick semantic-based interfaces to the entire Web is a long-term and very costly endeavor.
"With an existing search infrastructure, incredible capital resources, unlimited data, a leading search team, and clear mission to revolutionize the search landscape, Microsoft can rapidly accelerate our progress in building semantic search technology and bringing it to full Web scale," Powerset's Mark Johnson said in a blog post about the acquisition.
Powerset can provide direct answers to queries from its Wikipedia and Freebase index and highlight the most relevant search results based on the meaning of the query.
According to a blog post from Satya Nadella, Microsoft's senior vice president of Search, Portal, and Advertising, Powerset's engineers will join the Search Relevance team and remain in San Francisco.
Back to the leapfrogging Google question. Much of what Powerset has enabled with its technology is a superior user experience for searching. Powerset's Wikipedia search, which surfaces concepts, meanings, and relationships (like subject, verbs, and objects in a language), is the very small tip of the iceberg.
If Microsoft can succeed in extending Powerset's technology to key parts of the Web corpus, Google will have to figure out a way to match the quality and user experience. And, there is little doubt that if Google decided that what Powerset and Microsoft are doing as one is important, the company dedicated to dominating search through its engineering prowess will circle the wagons.
A few months ago, Powerset co-founder and CTO Barney Pell told me that his start-up company's software was a first step in changing the way users search and consume Web content. "It's a complete shift. You see this and you want to experience all content in this way. And, as an introduction, it will drive huge investment in semantic and linguistic technology, just as investments were made in information retrieval and scalable databases in the past," he said.
During a conversation after the announcement, Pell told me, "Natural language search will be the center of innovation for the next 20 years." It will likely take 20 years to engineer the semantic, natural language Web that Tim Berners-Lee envisioned in his 2001 essay in Scientific American.
Semantic search tool Powerset has put out a new iPhone app this week. Those looking to search on the go can now use the service's plain English searching capabilities to scour the entirety of Wikipedia and Freebase (coverage). The app comes after months of Powerset staff fumbling while trying to use their own product on the popular mobile device.
The new tool will pull up everything the desktop version does, although I found performance to be a tad slower--even over Wi-Fi. Outline, one of my favorite Powerset features that gives you quick links to each section in a Wikipedia article, has also made its way into the pocket version. While not as convenient as the desktop version which sits beside the actual Wikipedia article, it's a great way to skip down to a lower section of an article, which is normally an activity that makes you look like a complete idiot while you continuously drag your finger up and down the screen of your phone. There's also a much needed search function, something the iPhone's version of Safari is lacking from its desktop sibling.
I expect the company to come out with its own native app that will save past searches and let you store local content depending on how popular this version becomes. I've embedded some screens below. Also embedded after the break is a demo video of it in action.
... Read more
Amid speculation that Microsoft is looking to make an acquisition, Powerset launched a public beta of its Wikipedia search engine. It brings a new, rich semantic dimension via natural language query processing to Wikipedia that greatly improves the search and reading experience.
The company calls it a first step in changing the way users search and consume Web content. "It's a complete shift. You see this and you want to experience all content in this way," Barney Pell, co-founder and CTO of Powerset, told me. "And, as an introduction, it will drive huge investment in semantic and linguistic technology, just as investments were made in information retrieval and scalable databases in the past. People working in this space will be very marketable."
Users can enter keywords, phrases, or simple questions in Powerset's search box. Like many Web startups, Powerset is currently free of advertising.
Powerset's natural language search technology is based on patents licensed exclusively from PARC and its own proprietary indexing. Powerset's engine has read 2.5 million Wikipedia pages and extracted "meaning" from the sentences, creating a navigation and semantic layer on top of the popular Web encyclopedia. Following is a pictorial tour of Powerset features:
Powerset has also indexed Freebase, Metaweb's evolving, open database of structured information. The search result page presents Factz, a summary of key information extracted from Wikipedia pages.
Factz can be expanded to display more of the extracted verbs and their associated words and concepts.
Powerset creates a summary of information, or Dossier, on the right side of the page with Freebase and Wikipedia to give users a quick outline view about a topic. Clicking on an item takes the user to the location in the article and highlights the reference.
Powerset generates a summary of the key Factz to create a kind of Cliff's Notes version of Wikipedia article. Clicking on a summary item takes the user to the reference location in the article and highlights the key words. Powerset also includes a page for disambiguation of queries.
Powerset also shows a tag cloud of things and actions found by its linguistic analysis engine on the page. Clicking on a word shows related Factz in the outline.
Powerset can provide direct answers to queries from its Wikipedia and Freebase index, and highlight the most relevant search results based on the meaning of the query. Hakia, another semantic search engine, as well as Google can also surface the date Picasso was born at the top of their results pages.
Powerset's Wikipedia search engine isn't going to slow down the Google in the near term, but it will raise the bar on the search experience for all players. "There are implications beyond Wikipedia," Pell said. " Search is not done. You can see the emerging Semantic Web with our integration of Wikipedia and Freebase. We will add other components with structured data and ways to answers questions."
Powerset has said that the longer term plan is to read, linguistically analyze and index 20 billion documents on the Web, which will be a costly and ambitious undertaking. (Getting acquired by Microsoft would be helpful for that project. Powerset has received $12.5 million in Series A funding from Foundation Capital, Founders Fund, and angel investors in 2006.)
If you want to help the new search engine Powerset (preview) get built, check out Powerset Labs, opening up today. It lets you play around with very narrow "corpuses" of knowledge, one of which is quotation database from Wikipedia. As you use the demos, you get to feed data back into the system to help the team tweak the engine.
This is a very eagerly-anticipated search engine, but these limited Labs demos, while very cool, don't guarantee that the full general search engine will be as good as they are. Definitely worth experimenting with though -- it really does look like a better way to search.
Powerset is demoing at the TechCrunch 40 event today.
Powerset, which is developing a natural-language search engine to rival Google, will finally launch its service in September after more than a year in the labs, according to the company's Web site. Powerset CEO Barney Pell will demonstrate the technology, called Powerlabs, next week while speaking at the Singularity Summit, a two-day conference on artificial intelligence and the "future of humanity" in San Francisco, according to the newsletter KurzweilAI.net.
Unlike search giant Google, Palo Alto, Calif.-based Powerset is using techniques in AI to train computers not just to read words on the page, but to make connections between those words, or make inferences in the language. That way, the search engine could think through and redefine relevance beyond the most popular page or the site with the most occurrences of keywords entered in a search box (which is the way Google works).
Beyond demonstrating Powerlabs, Pell plans to talk about challenges to AI. He asks in his blog: "How many man-hours have actually been applied to the task of creating human-level AI? The number is likely a tiny fraction of the research in AI fields to date," Pell wrote. "So with advanced computing and communications technology amplifying research and with a focused effort on the core problems, progress might come about faster than anyone thinks."
Other speakers at the two-day conference will include Google's Director of Research Peter Norvig and MIT AI Lab Director Rodney Brooks.
- prev
- 1
- next






