For the last few years AdaptiveBlue has offered a semantically rich Web application that understands things such as books, movies, and music. Clicking on text, such as a company or movie name, brings up a context-sensitive menu of related links. The company is taking its technology a step further, adding a social dimension and renaming the product, "Glue." Along with Radar Networks' Twine and Powerset's Wikipedia search engine (acquired by Microsoft), Glue offers a compelling glimpse into how the Semantic Web will add a new, powerful level of intelligence to the Internet.
Rather than just connect things to related data and services, it also connects things to people and people to people and their things. For example, when a Glue user visits a site with things the software recognizes, such as a movie, artist, wine book, restaurant, or stock quote, a bar appears at the top of the screen with a list of friends and other people in the Glue network who looked at that object. Users can leave brief comments to share an opinion with others.
Glue allows users in its social network to discover what friends share interests with them without going to a central site.
"Glue works as a contextual filter," said Alex Iskold, founder and CEO of AdaptiveBlue. "We show relevant information from friends about the things they visit. They don't have to sift through lengthy lifestreams. For example, if you have 100 friends in FriendFeed, you are a human filter trying to sift through it and the information is completely out of context. The idea is to get the useful information 'chunked' contextually on the pages you visit. We are not asking people to change their habits."
The people surfaced in the Glue bar could have seen the object, such as a movie title, on a variety of sites. "People look at movies at different times and places, but the core semantic technology can understand the same thing and correlate it. As a movie fan, you just want to know what your friends think. It doesn't matter when or where the user visits things; Glue automatically connects them. There is no Glue destination site--the network is the user's context across the Web," Iskold said.
Glue allows users to add comments and indicate a "like" or favorite.
Glue also taps into existing social networks, such as Facebook and Twitter, to add friends, or to "follow" other people. The Glue Navigator allows users to browse the network of people and things, and what friends have identified as a "like" and what they have to say about objects. Glue can display all the music that a friend has viewed and drill down, offering contextual shortcuts to find out more, such as reviews and shopping links, about things on the Web. Glue remembers only the last 20 last things visited, and the things "liked" or commented upon.
Each user has a profile page that shows likes and the number of followers and who the user is following. "It's a way of cross-pollinating interests. You can see what I am interested in and perhaps it is the same books or wine with which you have an interest," Iskold said. "Glue also allows you to claim pages that represent you, such as a blog, FriendFeed, or Twitter. It's an outlet where people know where to find and connect with you. For example, other Glue users could see what you are up to recently on your personal blog."
Glue impressed investors at RRE Ventures and Union Square Ventures (Series A Lead) enough to fund a $4.5 million series B round recently. The company has a good chance of making it through the meltdown.
In the midst of the financial meltdown and a contentious upcoming election, you might think the U.S. government and taxpayers are just funding wars, bank bailouts, and bridges to nowhere or somewhere. But this is the same government that funded the Internet way back when and is also funding the next generation of technologies that will make the current Internet seem like a Model-T.
Over the last several years, the U.S. government--via DARPA (Defense Advanced Research Projects Agency) grants--has invested hundreds of millions of dollars in PAL, an acronym for "Personalized Assistant that Learns." Smarter software and networks and augmenting human intelligence are useful in times of war and peace.
As part of the PAL project, more than $200 million of DARPA money has been poured into CALO (Cognitive Assistant that Learns and Organizes) over the last five years. CALO has been run out of SRI International with the assistance of 25 research organizations and 400 researchers.
At this point, Siri's management is being secretive about what the company is developing. The elevator pitch goes something like, "Users' online lives are becoming more complicated and getting out of control for mainstream users. What if there was an easy way for normal users (non-power users) to ask the Internet to help them."
According to the Siri PR pitch, the product is "a new interaction paradigm for the consumer Internet experience that applies intelligence at the interface." The company expects to release a beta version of its initial product in the first half of 2009, according to Dag Kittlaus, a former Telenor Mobile and Motorola executive who is a co-founder and CEO of the company.
"We have to be careful at this stage," Kittlaus told me. "We don't like to play these games, but we need to keep a tight lid on what we are specifically doing. We have some original ideas of what the product is going to do, but we don't want to spark ideas among potential competitors." Those competitors would likely be masters of the Internet with large Internet footprints and research prowess like Google, Microsoft, and Yahoo.
Kittlaus did allow that Siri has more than a dozen partners, presumably large, well-established distribution players that can help build a consumer market for Siri's product. Unlike most Web start-ups, Siri has a business model, Kittlaus claimed. "We have good business models, both existing and emerging. We think CPA (cost per action) is the future, and this specific application is good for CPA and we are partnering on that."
He also touted the pedigree of the company's current cadre of 19 employees. "They are mostly engineers from Yahoo, Google, SRI, NASA, and Xerox PARC," he said. The chief architect of the CALO project, Adam Cheyer is a co-founder and vice president of engineering at Siri, and Tom Gruber, a well-known artificial intelligence and semantic Web expert, is a co-founder and CTO.
Cheyer described CALO as superset of what Siri is developing. "The CALO project is building an automated assistant to help manage and improve your life. The technology spans all aspects of interaction--natural language processing, speech recognition, and planning and reasoning capabilities--and interfaces with all kinds of systems, such as email and contacts," he said.
(Credit:
SRI International)
"Learning in the wild is core focus," he continued. "We want it to improve over time and learn from users with no coaching and without changing any code. We are taking the key elements from the project to commercialize it in a form that will delight users. We are not building systems that do things but that learn how to do things."
CALO sounds like a representation of the famous Apple Knowledge Navigator video from 1987.
"Siri is a subset of that concept," Cheyer said. "We have to keep in mind existing user behavior. It will feel like something close to what people use a lot. We will add speech recognition and other features as we go. We don't want to take such a leap that people cannot identify with it. We'll do things similar to but more advanced than what we do now. The longer term vision is the Knowledge Navigator, although it is an early chapter now and it might look different than that."
According to Gruber, intelligence at the interface allows the computers to make recommendations, like a personal assistant:
The interfaces we use to interact with the world's information are getting smarter. Web portals gave us someone else's idea of the content we should see. Then came search engines, which let us tell the system what we want, one query at a time. We are about to see the next wave -- intelligence at the interface -- in which the system knows about us, our information, and our physical environment. With knowledge about our context, an intelligent system can make recommendations and act on our behalf.
(Credit:
Tom Gruber)
Siri may be working on more intelligent Web interfaces that can make inferences based a wide variety of user activities (the "lifestream"), learning over time on its own, and then taking actions on behalf of users. For example, if you are booking travel or looking for a restaurant, Siri would know your preferences and about travel sites or restaurants, integrating data and context from multiple sources to deliver personal assistance. This could be especially useful in mobile scenarios where you don't want to wade through pages of search results or deal with complex interactions.
Tom Gruber: "If we want our technology to have world-changing impact, bring it to the interface: get useful knowledge from all those intelligent people on the Internet give the benefit of this knowledge to everyone. "
(Credit: Tom Gruber)We'll have to wait for next year, if the company stays on schedule, to see whether Siri can really define a new paradigm for experiencing the Web.
As expected (see previous reports), Microsoft scooped up Powerset to buttress its search efforts.
Barney Pell, Powerset co-founder and CTO
(Credit: Dan Farber)It's not a replacement for increasing market share by acquiring Yahoo Search, but it gives Microsoft some differentiated search technology and top engineers for less than $100 million. Ramez Naam, group program manager of Live Search, said the Powersoft negotiations happened in parallel with the Yahoo talks over the last few months. Google and Yahoo may also have been interested in Powerset, but no one is talking.
Whether Microsoft can leapfrog Google over the long term with this semantic engine remains to be seen.
Powerset had done a good job of creating a rich semantic layer on top of Wikipedia, but bringing natural language and slick semantic-based interfaces to the entire Web is a long-term and very costly endeavor.
"With an existing search infrastructure, incredible capital resources, unlimited data, a leading search team, and clear mission to revolutionize the search landscape, Microsoft can rapidly accelerate our progress in building semantic search technology and bringing it to full Web scale," Powerset's Mark Johnson said in a blog post about the acquisition.
Powerset can provide direct answers to queries from its Wikipedia and Freebase index and highlight the most relevant search results based on the meaning of the query.
According to a blog post from Satya Nadella, Microsoft's senior vice president of Search, Portal, and Advertising, Powerset's engineers will join the Search Relevance team and remain in San Francisco.
Back to the leapfrogging Google question. Much of what Powerset has enabled with its technology is a superior user experience for searching. Powerset's Wikipedia search, which surfaces concepts, meanings, and relationships (like subject, verbs, and objects in a language), is the very small tip of the iceberg.
If Microsoft can succeed in extending Powerset's technology to key parts of the Web corpus, Google will have to figure out a way to match the quality and user experience. And, there is little doubt that if Google decided that what Powerset and Microsoft are doing as one is important, the company dedicated to dominating search through its engineering prowess will circle the wagons.
A few months ago, Powerset co-founder and CTO Barney Pell told me that his start-up company's software was a first step in changing the way users search and consume Web content. "It's a complete shift. You see this and you want to experience all content in this way. And, as an introduction, it will drive huge investment in semantic and linguistic technology, just as investments were made in information retrieval and scalable databases in the past," he said.
During a conversation after the announcement, Pell told me, "Natural language search will be the center of innovation for the next 20 years." It will likely take 20 years to engineer the semantic, natural language Web that Tim Berners-Lee envisioned in his 2001 essay in Scientific American.
Amid speculation that Microsoft is looking to make an acquisition, Powerset launched a public beta of its Wikipedia search engine. It brings a new, rich semantic dimension via natural language query processing to Wikipedia that greatly improves the search and reading experience.
The company calls it a first step in changing the way users search and consume Web content. "It's a complete shift. You see this and you want to experience all content in this way," Barney Pell, co-founder and CTO of Powerset, told me. "And, as an introduction, it will drive huge investment in semantic and linguistic technology, just as investments were made in information retrieval and scalable databases in the past. People working in this space will be very marketable."
Users can enter keywords, phrases, or simple questions in Powerset's search box. Like many Web startups, Powerset is currently free of advertising.
Powerset's natural language search technology is based on patents licensed exclusively from PARC and its own proprietary indexing. Powerset's engine has read 2.5 million Wikipedia pages and extracted "meaning" from the sentences, creating a navigation and semantic layer on top of the popular Web encyclopedia. Following is a pictorial tour of Powerset features:
Powerset has also indexed Freebase, Metaweb's evolving, open database of structured information. The search result page presents Factz, a summary of key information extracted from Wikipedia pages.
Factz can be expanded to display more of the extracted verbs and their associated words and concepts.
Powerset creates a summary of information, or Dossier, on the right side of the page with Freebase and Wikipedia to give users a quick outline view about a topic. Clicking on an item takes the user to the location in the article and highlights the reference.
Powerset generates a summary of the key Factz to create a kind of Cliff's Notes version of Wikipedia article. Clicking on a summary item takes the user to the reference location in the article and highlights the key words. Powerset also includes a page for disambiguation of queries.
Powerset also shows a tag cloud of things and actions found by its linguistic analysis engine on the page. Clicking on a word shows related Factz in the outline.
Powerset can provide direct answers to queries from its Wikipedia and Freebase index, and highlight the most relevant search results based on the meaning of the query. Hakia, another semantic search engine, as well as Google can also surface the date Picasso was born at the top of their results pages.
Powerset's Wikipedia search engine isn't going to slow down the Google in the near term, but it will raise the bar on the search experience for all players. "There are implications beyond Wikipedia," Pell said. " Search is not done. You can see the emerging Semantic Web with our integration of Wikipedia and Freebase. We will add other components with structured data and ways to answers questions."
Powerset has said that the longer term plan is to read, linguistically analyze and index 20 billion documents on the Web, which will be a costly and ambitious undertaking. (Getting acquired by Microsoft would be helpful for that project. Powerset has received $12.5 million in Series A funding from Foundation Capital, Founders Fund, and angel investors in 2006.)
While Powerset is preparing for the public rollout of its unique, semantic search engine, Microsoft may be interested in acquiring the start-up, according to sources.
I asked Barney Pell, Powerset co-founder and CTO, whether there was any truth to a Microsoft-Powerset deal rumors. He said, "No comment," and noted his policy of not commenting on rumors. Microsoft also declined to comment on rumors.
Powerset co-founder and CTO Barney Pell
(Credit: Dan Farber)Bringing Powerset, which has no revenue and a tiny user base at this point, into the fold would be spare change for Microsoft compared with spending $45 billion to $50 billion on Yahoo. But, it could bring something useful to Microsoft--and Yahoo, if their union were consummated--in the battle for search users with arch rival Google.
Powerset raises the bar on search based on a preview that I had of the service last month. Powerset differs from the Google in that it extracts and indexes concepts, relationships, and meaning, rather than keywords. It's able to create connections and pivot in some cases in ways that elude Google's proficient engine, which favors more of a statistical approach.
Powerset uses a sophisticated natural language parser (licensed from Xerox PARC) to find subjects, verbs, objects, synonyms, and other elements for indexing.
Initially, Powerset is performing its magic on the 3 million pages of Wikipedia content, enabling a new kind of search and navigation experience on the popular information resource.
A next step would be to index the Web, which would be of great interest to Google rivals. Powerset has garnered $12.5 million in Series A funding from Foundation Capital, Founders Fund, and angel investors. Given the cost to scale up a semantically rich index of 20 billion Web pages, Microsoft would be a good match for Powerset. Then again, so would Google. Stay tuned...
The Semantic Web has been just around the corner for a few years. It turns out that bringing a semantic layer of metadata to the Internet is like climbing a mountain in flip-flops.
Tuesday night, Semantic Web mountain climbers Powerset, Radar Networks, and Metaweb participated in a salon at Powerset's San Francisco office, where I talked with them about their product plans.
Powerset gives wings to Wikipedia
I got a preview of Powerset's search engine, which is due to go into beta in the coming weeks, according to co-founder and CTO Barney Pell and as reported by TechCrunch.
Powerset differs from Google and other mainstream search engines in that it linguistically parses sentences, finding subjects, verbs, objects, synonyms, and other elements using a highly sophisticated, language-independent parser licensed from Xerox PARC).
Powerset then extracts and indexes concepts, relationships, and meanings, rather than keywords. (I wrote about Powerset when it first came out of stealth mode, in June 2007.)
Rather than trying to boil the search ocean, compete with Google, and deal with spam and 20 billion documents, Powerset has focused its initial efforts on giving wings to the 3 million pages of Wikipedia.
Hakia's semantic search engine also indexes Wikipedia and other sources. However, Powerset returns a more comprehensive dossier of results for queries, based on deep analysis of Wikipedia pages and other content, and also provides new ways to navigate and discover facts on the individual Wikipedia pages. More details to come when Powerset officially launches its public beta version.
Powerset plans to index the Web at some point (at a significant cost, in terms of servers and bandwidth). For now--or more precisely, when the company allows the public access to its technology--Wikipedia users will be the beneficiaries of a powerful semantic index and user experience.
True KnowledgeI also got a look at True Knowledge's search engine. Company CEO William Tunstall-Pedoe said the search engine is in private beta for now, with about 7,000 users.
Unlike Powerset and other search engines, Cambridge, England-based True Knowledge is building its own knowledge base. Users input facts, as in Wikipedia, but in a more structured manner. In addition, True Knowledge imports data from sources, including Wikipedia, in the form of discrete facts, such "Sacramento is the capital of California."
Queries, including those in natural language, are parsed for machine reading, and they access the repository of facts accumulated. True Knowledge can make inferences, such as in the following example.
(Credit:
True Knowledge)
The capability to infer truths based on the data repository would be a welcome feature for Wikipedia, which doesn't have an automated method for dealing with contradictions.
Barney Pell (Powerset), William Tunstall-Pedoe (True Knowledge), Nova Spivack (Radar Networks), Paul Davison (Metaweb)
(Credit: Dan Farber/CNET News)
Metaweb
Another San Francisco Semantic Web start-up, Metaweb, was also a participant in the salon. The company's Freebase is more similar to True Knowledge than Powerset.
Freebase is an community-built database with a large corpus of open data sets, including Wikipedia and MusicBrainz. Powerset includes some Freebase-structured content in its index, and True Knowledge could add Freebase data to its knowledge repository.
Radar Networks' Twine
I also chatted with Nova Spivack, co-founder and CEO of Radar Networks. His company created Twine, an application combining bookmarking, blogging, and RSS reading, with an underlying semantic engine to tie the pieces of data together.
Spivack said Twine has about 7,000 users in private beta, as well as 40,000 standing in line for access. Half of the users have created private Twines, with corporations and closed communities of interest using the service for collaboration.
Major enhancements are planned for the summer and fall, including allowing for complete customization of the user interface. "We have only surfaced a bit of the platform so far. Twine as a platform will integrate with other applications, such as blogs, catalogs, social communities, and corporate sites," he told me.
"It's an enormous multiyear project," Spivack said. It's not like a Google beta or a 1.0 version masquerading as a beta." The same could be said of the other Semantic Web services in the room. It's going to be a very long beta cycle.
The inventor of the World Wide Web, Sir Tim Berners-Lee, isn't satisfied living on his past laurels. At every opportunity he talks up the Semantic Web, which he calls the "Web of the future."
In a recent article in the Times Online, he said that what Google has done so far pales in comparison with what the Semantic Web will bring. Social -networking leaders Facebook and MySpace will eventually be trumped by networks that connect all types of things, not just people, he said. To be clear, he wasn't saying that Google is doomed.
In the Times Online article, Berners-Lee gave an example of how the Semantic Web would work:
"Imagine if two completely separate things--your bank statements and your calendar--spoke the same language and could share information with one another. You could drag one on top of the other and a whole bunch of dots would appear showing you when you spent your money."
"If you still weren't sure of where you were when you made a particular transaction, you could then drag your photo album on top of the calendar, and be reminded that you used your credit card at the same time you were taking pictures of your kids at a theme park. So you would know not to claim it as a tax deduction."
Google's technology and approach to parsing the Web is based on statistical analysis of incredibly vast amounts of data. The Semantic Web involves creating a layer of metadata that enables rich connections between any type or piece of data.
In 2006, Peter Norvig, Google's director of research, noted some challenges to building a Semantic Web, such as creating the metadata, agreeing on standards, and gaming the system.
"We deal with millions of Web masters who can't configure a server, can't write HTML. It's hard for them to go to the next step. The second problem is competition. Some commercial providers say, 'I'm the leader. Why should I standardize?' The third problem is one of deception. We deal every day with people who try to rank higher in the results and then try to sell someone Viagra when that's not what they are looking for. With less human oversight with the Semantic Web, we are worried about it being easier to be deceptive."
Peter Norvig, Google director of research
However, Norvig does envision a Web of connections far down the road. In a New Scientist article projecting into the future he stated:
In 50 years the scene will be transformed. Instead of typing a few words into a search engine, people will discuss their needs with a digital intermediary, which will offer suggestions and refinements. The result will not be a list of links, but an annotated report (or a simple conversation) that synthesises the important points, with references to the original literature. People won't think of "search" as a separate category--it will all be part of living.
The digital intermediary Norvig mentioned will be informed by semantic metadata, and search engines will take advantage of semantic metadata to deliver more precise and richer results.
Building Semantic Web applications has proven to be challenging so far. For example, Radar Networks just released a public beta of Twine, a personal information manager that uses Semantic Web technology, such as RDF (Resource Description Framework). With Twine, Radar Networks is trying to unleash the "semantic graph," which turns people, places, companies, products, Web pages, videos, photos and other data into Semantic Web content, according to Nova Spivack, CEO of the company.
Twine has met with some early criticism.
In response to the critique, Spivack wrote, "Twine is already far and beyond what any other semantic app I know of is capable of, but that still isn't good enough. We have to push further and focus more on usability. We are opening it up early in order to get feedback and more help testing and guiding the direction of the app from users."
"Ultimately, we will be the category killer for bookmarking, taking notes and organizing information," Spivack proclaimed to me in conversation today, noting that reaching a level of success was "definitely going to take time."
Radar Networks is not alone in trying to turn Semantic Web concepts into usable products. Other startups, such as Freebase, Powerset, Hakia, Blue Organizer, Wikia and Reuters' Calais, face a similar uphill climb to gain adoption.
What's evident is that Berners-Lee continues to be ahead of the curve. Just as the Internet was in gestation for decades, creating a semantic layer at the core of the Web will take decades to evolve.
- prev
- 1
- next





