• On TV.com: TOP 10 Shows CANCELED Too Soon

Outside the Lines

Read all 'Powerset' posts in Outside the Lines
July 31, 2008 10:16 AM PDT

Radar Networks readies new release of Twine

by Dan Farber
  • 1 comment

In March, Radar Networks launched Twine, an application that organizes information and connects people, places, companies, products, Web pages, videos, and photos. Along with Metaweb's Freebase, Powerset (sold to Microsoft), Hakia, Reuters' Calias, AdaptiveBlue and a few other start-ups, Radar Networks is trying to crack the code on building a piece of the semantic Web.

In a Times Online article, Web creator Tim Berners-Lee gave an example of how the semantic Web would work:

"Imagine if two completely separate things--your bank statements and your calendar--spoke the same language and could share information with one another. You could drag one on top of the other and a whole bunch of dots would appear showing you when you spent your money."

Twine won't provide that futuristic capability but it attempts to build a "semantic graph" of relationships between content, tags, people and Twines (the collection of items of an individual or group on the service). Each piece of content is a "semantic object," Radar Networks CEO Nova Spivack said, using Twine's underlying ontology and database, which applies semantic technologies such as RDF for storing data.

Spivack told me that public Twines are now visible to visitors to the site and to search engines. So far in the beta phase nearly 15,000 Twines have been created and 354,000 pieces of user-contributed content have been added into the system. More than 50,000 users signed up (34,000 are active) for the service, spending 13 to 15 minutes per session on the site, he said.

A major new release of the Twine platform is slated for release in the fall to address shortcomings and introduce new features. "We have worked on a lot of simplification, reducing the clutter, and we still need to reduce more. Twine has a lot of powerful features nobody uses, so we are moving some of the advanced features out of the way," Spivack said. "The fall release will bring more intelligence and semantics to the surface. For example, we will let anyone define a new type of thing, such as a recipe or baseball team form, to author. It's more like what Freebase does, and we will also likely integrate with Freebase over time."

In addition, performance improvements and algorithms to improve search as well as mining and crawling content are in the works. "A major focus of our work is on personalization and recommendations," Spivack said. "Ultimately, Twine is about 'interest networking' and is a content distribution network. People declare their interests, add content, join Twines and connect with people. As users work with the system it learns about their interests, using artificial intelligence and semantic Web technologies to provide more relevance. We are not attempting to index the whole Web, just the best stuff of interest to users. Ninety-nine percent of what's on the Web is not interesting to a user, so it's more about high signal to noise."

On the business front, Spivack believes that Twine can be an intermediary for users, delivering more targeted marketing messages in addition to content. It's similar to the way Facebook is creating a new kind of environment for advertising based on knowing member interests and their social or semantic graph. "The goal for Twine is to be the place on the Web that best understands your interests and represents them to others. The key is to give users control and privacy," Spivack said.

Twine is a work in progress. It's ambitious and has the potential to demonstrate how a more semantic Web could benefit users. The biggest challenge will be scaling the back-end infrastructure and attracting users, which means Twine will have to become far more easy to configure and use. We'll see in the coming months whether the forthcoming changes to Twine help open the floodgates.

Updated numbers on users and usage, 6:30 AM PST, August 1

July 3, 2008 12:34 PM PDT

EIC Squared: Indexing Flash; Powerset; and Viacom vs. Google

by Dan Farber
  • Post a comment

On this week's EIC Squared podcast, ZDNet's Larry Dignan and I discuss this week's big stories. It was a busy week on the search front. Adobe is providing Google and Yahoo with Flash Player technology that allows their search engine crawlers to find and index SWF content, including Flash "gadgets" such as buttons or menus and self-contained Flash Web sites. It's good to make more information accessible via search engines. However, Microsoft has been silent on whether Live Search would index Flash content.

In addition, Microsoft bought Powerset for about $100 million to enhance its search platforms. It's not a substitute for acquiring market share via Yahoo Search, but it provides a foundation for making the search experience far more compelling and precise in fewer clicks.

Of course, the Microhoo drama continues this week with the latest rumors. Larry is ready for this opera to be finished.

Finally, we discuss a judge's ruling in Viacom's $1 billion copyright infringement suit against Google and YouTube.

U.S. District Judge Louis L. Stanton ruled that records of every video watched by YouTube users, including login names and IP addresses, should be given to Viacom's lawyers. Larry said it was like combining the worst aspects of a fishing expedition and a witch hunt. Viacom is maintaining that it won't look at personal data and Google is asking for time to anonymize the information. If Judge Stanton's ruling stands, the last shreds of personal privacy on the Web could be thrown out the window.

July 1, 2008 11:55 AM PDT

It's official: Microsoft acquires Powerset

by Dan Farber
  • 3 comments

As expected (see previous reports), Microsoft scooped up Powerset to buttress its search efforts.

Barney Pell, Powerset co-founder and CTO

(Credit: Dan Farber)

It's not a replacement for increasing market share by acquiring Yahoo Search, but it gives Microsoft some differentiated search technology and top engineers for less than $100 million. Ramez Naam, group program manager of Live Search, said the Powersoft negotiations happened in parallel with the Yahoo talks over the last few months. Google and Yahoo may also have been interested in Powerset, but no one is talking.

Whether Microsoft can leapfrog Google over the long term with this semantic engine remains to be seen.

Powerset had done a good job of creating a rich semantic layer on top of Wikipedia, but bringing natural language and slick semantic-based interfaces to the entire Web is a long-term and very costly endeavor.

"With an existing search infrastructure, incredible capital resources, unlimited data, a leading search team, and clear mission to revolutionize the search landscape, Microsoft can rapidly accelerate our progress in building semantic search technology and bringing it to full Web scale," Powerset's Mark Johnson said in a blog post about the acquisition.

Powerset can provide direct answers to queries from its Wikipedia and Freebase index and highlight the most relevant search results based on the meaning of the query.

According to a blog post from Satya Nadella, Microsoft's senior vice president of Search, Portal, and Advertising, Powerset's engineers will join the Search Relevance team and remain in San Francisco.

Back to the leapfrogging Google question. Much of what Powerset has enabled with its technology is a superior user experience for searching. Powerset's Wikipedia search, which surfaces concepts, meanings, and relationships (like subject, verbs, and objects in a language), is the very small tip of the iceberg.

If Microsoft can succeed in extending Powerset's technology to key parts of the Web corpus, Google will have to figure out a way to match the quality and user experience. And, there is little doubt that if Google decided that what Powerset and Microsoft are doing as one is important, the company dedicated to dominating search through its engineering prowess will circle the wagons.

A few months ago, Powerset co-founder and CTO Barney Pell told me that his start-up company's software was a first step in changing the way users search and consume Web content. "It's a complete shift. You see this and you want to experience all content in this way. And, as an introduction, it will drive huge investment in semantic and linguistic technology, just as investments were made in information retrieval and scalable databases in the past," he said.

During a conversation after the announcement, Pell told me, "Natural language search will be the center of innovation for the next 20 years." It will likely take 20 years to engineer the semantic, natural language Web that Tim Berners-Lee envisioned in his 2001 essay in Scientific American.

May 11, 2008 9:25 PM PDT

Powerset brings the Semantic Web to Wikipedia

by Dan Farber
  • 4 comments

Amid speculation that Microsoft is looking to make an acquisition, Powerset launched a public beta of its Wikipedia search engine. It brings a new, rich semantic dimension via natural language query processing to Wikipedia that greatly improves the search and reading experience.

The company calls it a first step in changing the way users search and consume Web content. "It's a complete shift. You see this and you want to experience all content in this way," Barney Pell, co-founder and CTO of Powerset, told me. "And, as an introduction, it will drive huge investment in semantic and linguistic technology, just as investments were made in information retrieval and scalable databases in the past. People working in this space will be very marketable."

Users can enter keywords, phrases, or simple questions in Powerset's search box. Like many Web startups, Powerset is currently free of advertising.

Powerset's natural language search technology is based on patents licensed exclusively from PARC and its own proprietary indexing. Powerset's engine has read 2.5 million Wikipedia pages and extracted "meaning" from the sentences, creating a navigation and semantic layer on top of the popular Web encyclopedia. Following is a pictorial tour of Powerset features:

Powerset has also indexed Freebase, Metaweb's evolving, open database of structured information. The search result page presents Factz, a summary of key information extracted from Wikipedia pages.

Factz can be expanded to display more of the extracted verbs and their associated words and concepts.

Powerset creates a summary of information, or Dossier, on the right side of the page with Freebase and Wikipedia to give users a quick outline view about a topic. Clicking on an item takes the user to the location in the article and highlights the reference.

Powerset generates a summary of the key Factz to create a kind of Cliff's Notes version of Wikipedia article. Clicking on a summary item takes the user to the reference location in the article and highlights the key words. Powerset also includes a page for disambiguation of queries.

Powerset also shows a tag cloud of things and actions found by its linguistic analysis engine on the page. Clicking on a word shows related Factz in the outline.

Powerset can provide direct answers to queries from its Wikipedia and Freebase index, and highlight the most relevant search results based on the meaning of the query. Hakia, another semantic search engine, as well as Google can also surface the date Picasso was born at the top of their results pages.

Powerset's Wikipedia search engine isn't going to slow down the Google in the near term, but it will raise the bar on the search experience for all players. "There are implications beyond Wikipedia," Pell said. " Search is not done. You can see the emerging Semantic Web with our integration of Wikipedia and Freebase. We will add other components with structured data and ways to answers questions."

Powerset has said that the longer term plan is to read, linguistically analyze and index 20 billion documents on the Web, which will be a costly and ambitious undertaking. (Getting acquired by Microsoft would be helpful for that project. Powerset has received $12.5 million in Series A funding from Foundation Capital, Founders Fund, and angel investors in 2006.)

May 10, 2008 5:12 AM PDT

Is Microsoft stalking Powerset's search technology?

by Dan Farber
  • 3 comments

While Powerset is preparing for the public rollout of its unique, semantic search engine, Microsoft may be interested in acquiring the start-up, according to sources.

I asked Barney Pell, Powerset co-founder and CTO, whether there was any truth to a Microsoft-Powerset deal rumors. He said, "No comment," and noted his policy of not commenting on rumors. Microsoft also declined to comment on rumors.

Powerset co-founder and CTO Barney Pell

(Credit: Dan Farber)

Bringing Powerset, which has no revenue and a tiny user base at this point, into the fold would be spare change for Microsoft compared with spending $45 billion to $50 billion on Yahoo. But, it could bring something useful to Microsoft--and Yahoo, if their union were consummated--in the battle for search users with arch rival Google.

Powerset raises the bar on search based on a preview that I had of the service last month. Powerset differs from the Google in that it extracts and indexes concepts, relationships, and meaning, rather than keywords. It's able to create connections and pivot in some cases in ways that elude Google's proficient engine, which favors more of a statistical approach.

Powerset uses a sophisticated natural language parser (licensed from Xerox PARC) to find subjects, verbs, objects, synonyms, and other elements for indexing.

Initially, Powerset is performing its magic on the 3 million pages of Wikipedia content, enabling a new kind of search and navigation experience on the popular information resource.

A next step would be to index the Web, which would be of great interest to Google rivals. Powerset has garnered $12.5 million in Series A funding from Foundation Capital, Founders Fund, and angel investors. Given the cost to scale up a semantically rich index of 20 billion Web pages, Microsoft would be a good match for Powerset. Then again, so would Google. Stay tuned...

April 16, 2008 7:15 AM PDT

On the road to the Semantic Web

by Dan Farber
  • 1 comment

The Semantic Web has been just around the corner for a few years. It turns out that bringing a semantic layer of metadata to the Internet is like climbing a mountain in flip-flops.

Tuesday night, Semantic Web mountain climbers Powerset, Radar Networks, and Metaweb participated in a salon at Powerset's San Francisco office, where I talked with them about their product plans.

Powerset gives wings to Wikipedia
I got a preview of Powerset's search engine, which is due to go into beta in the coming weeks, according to co-founder and CTO Barney Pell and as reported by TechCrunch.

Powerset differs from Google and other mainstream search engines in that it linguistically parses sentences, finding subjects, verbs, objects, synonyms, and other elements using a highly sophisticated, language-independent parser licensed from Xerox PARC).

Powerset then extracts and indexes concepts, relationships, and meanings, rather than keywords. (I wrote about Powerset when it first came out of stealth mode, in June 2007.)

Rather than trying to boil the search ocean, compete with Google, and deal with spam and 20 billion documents, Powerset has focused its initial efforts on giving wings to the 3 million pages of Wikipedia.

Hakia's semantic search engine also indexes Wikipedia and other sources. However, Powerset returns a more comprehensive dossier of results for queries, based on deep analysis of Wikipedia pages and other content, and also provides new ways to navigate and discover facts on the individual Wikipedia pages. More details to come when Powerset officially launches its public beta version.

Powerset plans to index the Web at some point (at a significant cost, in terms of servers and bandwidth). For now--or more precisely, when the company allows the public access to its technology--Wikipedia users will be the beneficiaries of a powerful semantic index and user experience.

True Knowledge
I also got a look at True Knowledge's search engine. Company CEO William Tunstall-Pedoe said the search engine is in private beta for now, with about 7,000 users.

Unlike Powerset and other search engines, Cambridge, England-based True Knowledge is building its own knowledge base. Users input facts, as in Wikipedia, but in a more structured manner. In addition, True Knowledge imports data from sources, including Wikipedia, in the form of discrete facts, such "Sacramento is the capital of California."

Queries, including those in natural language, are parsed for machine reading, and they access the repository of facts accumulated. True Knowledge can make inferences, such as in the following example.

(Credit: True Knowledge)

The capability to infer truths based on the data repository would be a welcome feature for Wikipedia, which doesn't have an automated method for dealing with contradictions.

Barney Pell (Powerset), William Tunstall-Pedoe (True Knowledge), Nova Spivack (Radar Networks), Paul Davison (Metaweb)

(Credit: Dan Farber/CNET News)

Metaweb
Another San Francisco Semantic Web start-up, Metaweb, was also a participant in the salon. The company's Freebase is more similar to True Knowledge than Powerset.

Freebase is an community-built database with a large corpus of open data sets, including Wikipedia and MusicBrainz. Powerset includes some Freebase-structured content in its index, and True Knowledge could add Freebase data to its knowledge repository.

Radar Networks' Twine
I also chatted with Nova Spivack, co-founder and CEO of Radar Networks. His company created Twine, an application combining bookmarking, blogging, and RSS reading, with an underlying semantic engine to tie the pieces of data together.

Spivack said Twine has about 7,000 users in private beta, as well as 40,000 standing in line for access. Half of the users have created private Twines, with corporations and closed communities of interest using the service for collaboration.

Major enhancements are planned for the summer and fall, including allowing for complete customization of the user interface. "We have only surfaced a bit of the platform so far. Twine as a platform will integrate with other applications, such as blogs, catalogs, social communities, and corporate sites," he told me.

"It's an enormous multiyear project," Spivack said. It's not like a Google beta or a 1.0 version masquerading as a beta." The same could be said of the other Semantic Web services in the room. It's going to be a very long beta cycle.

  • prev
  • 1
  • next
advertisement
Click Here

The browser battles go on and on

roundup From Firefox to IE and from Chrome to Opera and Safari, there's no sitting still for browser makers looking to keep their products fresh and competitive.

3G wireless still holds promise

The next generation of 4G wireless may get all the headlines, but advanced 3G technology will likely dominate services for the next few years.

About Outside the Lines

Dan Farber is the editor in chief of CNET News. He has covered technology for more than two decades, and he previously served as editor in chief of ZDNet, PC Week and MacWeek. Outside the Lines explores the intersection of business and technology.

Add this feed to your online news reader

Outside the Lines topics

Subscribe to the EIC² podcast

Editors Dan Farber of News.com and Larry Dignan of ZDNet, square off in EIC² in this weekly podcast. The two editor in chiefs talk about the big tech stories of the day and provide insight and analysis.

Subscribe to this podcast using an RSS reader other than iTunes

Subscribe to this podcast using iTunes

Most Discussed



advertisement

Inside CNET News

Scroll Left Scroll Right