• On TV.com: New TV sex symbol: Vintage black PORSCHE

Outside the Lines

Read all 'Twine' posts in Outside the Lines
March 8, 2009 8:42 AM PDT

Wolfram Alpha: Next major search breakthrough?

by Dan Farber
  • 30 comments
(Credit: Wolfram Research)

Stephen Wolfram has a track record of scientific breakthroughs and some controversy. He received his Ph.D. in theoretical physics from Caltech in 1979 when he was 20 and has focused most of his career on probing complex systems. In 1988 he launched Mathematica, powerful computational software that has become the gold standard in its field. In 2002, Wolfram produced a 1,280-page tome, A New Kind of Science, based on a decade of exploration in cellular automata and complex systems. The book stirred up a lot of debate in scientific circles. Legendary physicist Freeman Dyson described the tome as "a case of style over substance." (See Steven Levy's Wired profile of Wolfram).

In May, Wolfram will unveil his latest creation, now called Wolfram Alpha. It applies his work with Mathematica and NKS (A New Kind of Science) to Web search. "All one needs to be able to do is to take questions people ask in natural language, and represent them in a precise form that fits into the computations one can do," Wolfram said in a recent blog post. "I'm happy to say that with a mixture of many clever algorithms and heuristics, lots of linguistic discovery and linguistic curation, and what probably amount to some serious theoretical breakthroughs, we're actually managing to make it work...It's going to be a website: www.wolframalpha.com. With one simple input field that gives access to a huge system, with trillions of pieces of curated data and millions of lines of algorithms," he added.

It follows the Google principle, with a simple input box, but takes a different approach to rendering search results. Nova Spivack, CEO of Radar Networks, which developed Twine, an ambitious "interest network" Web application based on semantic Web technologies, said that Wolfram Alpha may be as "important for the Web (and the world) as Google, but for a different purpose."

Spivack shared his initial impressions of Wolfram Alpha based on a two-hour conversation with Wolfram.

"Wolfram Alpha is like plugging into a vast electronic brain. It provides extremely impressive and thorough answers to a wide range of questions asked in many different ways, and it computes answers, it doesn't merely look them up in a big database."

"In this respect it is vastly smarter than (and different from) Google. Google simply retrieves documents based on keyword searches. Google doesn't understand the question or the answer, and doesn't compute answers based on models of various fields of human knowledge."

Spivack gave some insight as to how the Wolfram's search engine works:

Wolfram Alpha is a system for computing the answers to questions. To accomplish this it uses built-in models of fields of knowledge, complete with data and algorithms, that represent real-world knowledge.

For example, it contains formal models of much of what we know about science -- massive amounts of data about various physical laws and properties, as well as data about the physical world.

Based on this you can ask it scientific questions and it can compute the answers for you. Even if it has not been programmed explicity to answer each question you might ask it.

But science is just one of the domains it knows about--it also knows about technology, geography, weather, cooking, business, travel, people, music, and more.

It also has a natural language interface for asking it questions. This interface allows you to ask questions in plain language, or even in various forms of abbreviated notation, and then provides detailed answers.

The vision seems to be to create a system which can do for formal knowledge (all the formally definable systems, heuristics, algorithms, rules, methods, theorems, and facts in the world) what search engines have done for informal knowledge (all the text and documents in various forms of media).

Wolfram's engine isn't going to replace Google, according to Spivack, although he suggests Google would like to own it.

"You would probably not use Wolfram Alpha to shop for a new car, find blog posts about a topic, or to choose a resort for your honeymoon. It is not a system that will understand the nuances of what you consider to be the perfect romantic getaway, for example--there is still no substitute for manual human-guided search for that. Where it appears to excel is when you want facts about something, or when you need to compute a factual answer to some set of questions about factual data."

For now, we'll have to wait until May to see whether the Web and scientific worlds embrace Wolfram's Alpha as a major mathematical and engineering breakthrough.

Read Nova Spivack's "Wolfram Alpha is Coming -- and It Could be as Important as Google"

See also: VentureBeat: Wolfram Alpha -- it's like plugging into an electronic brain

October 20, 2008 8:14 PM PDT

Radar Networks delivers Twine 1.0

by Dan Farber
  • 1 comment

After less than a year, Radar Networks is going from beta to version 1.0 of its Twine "interest network" Web application based on Semantic Web technologies. "We are not spending four years in beta," said Radar Networks CEO Nova Spivack. "We have a minimal set of features ready for prime time."

The minimal set of interest network features allows Twine users to track and discover relevant organizations, products, people, places, tags, and items, such as photos, documents, and recipes, that match their interests. Twine has a social dimension in the way it leverages the wisdom of its members, via bookmarking, tagging, and shared connections. Underlying Semantic Web technologies provide concept mapping (such as interrelationships between topics or people) and more relevant and structured search results.

In the last six months of beta testing, 500,000 users visited Twine and 50,000 remain active, Spivack said. Half of the Twines created are public and members have added about one million items to the database. "The most interesting statistic is time spent--which has risen in the last month to 12 minutes per session and continues to trend up. This is more than tracking and discovery sites like delicious, Digg and StumbleUpon receive," he claimed.

In order to keep the 50,000 active users and grow its base, the key change from the beta in version 1.0 is a simplification of the user experience. "When we started beta, Twine was about collecting, organizing, discovering, and sharing, and all were equal. It turns out that tracking interests is the most important, so that is what we are emphasizing," Spivack said. Among the more consumer friendly enhancements, the site navigation is streamlined, site performance is faster, moderation features are improved, users can invite people from their e-mail address books, and the recommendation engine explains why an item was recommended and allows a user to opt out.

The Semantic Web aspect of Twine, which was touted when it first launched, has been relegated to the background.

"When we first launched, semantic technology was the story," Spivack said. "It was novel then, but now we have to show the value, and to do that we can't emphasize the semantic technologies in Twine. It's under the hood and that is where it belongs. We surface the value of semantic in lots of ways, such as with the recommendations. Next year users will be able to create their own data types and build an ontology without knowing it's an ontology."

In about three weeks, an update to Twine 1.0 will add a more advanced bookmarking tool and natural language crawling to improve relevancy. "Every page added to Twine will use natural language processing to determine what is the content versus ads, navigation, and other elements. We'll put the full text in our search index and generate tags and create a summary and then crawl every link in the text one hop out and bring that content in as well," Spivack said.

Next year Twine will unleash more of its semantic power, with richer support for structured data and a two-way API for getting data in and out of Twine that will attract application developers, Spivack said. In addition, Twine will introduce a new monetization scheme. "Twine will be to marketing what Google was to advertising," he boasted.

"Advertising is pull-based, passive, and on the side of pages. Marketing is sponsored and highly relevant content that is targeted and delivered to someone in their in interest feeds. It will be clearly marked and users have to option to accept or reject it." The concept is similar to what Facebook has attempted to do with its Beacon program. Spivack said that company has filed patents for its monetization concepts, including a way for users to market semantic objects. "We can provide interesting socio-economics around the content that people collect, share, and buy, and build a one-to-one channel between marketers and users."

Radar Networks is in good shape to weather the economic storm. The company raised $13 million in Series B funding from Velocity Interactive, Draper Fisher Jurvetson, and Vulcan Capital earlier this year.

Below is a video from Radar Networks outlined the new features of Twine 1.0:


Twine Overview from Twine Official on Vimeo

Originally posted at Webware
July 31, 2008 10:16 AM PDT

Radar Networks readies new release of Twine

by Dan Farber
  • 1 comment

In March, Radar Networks launched Twine, an application that organizes information and connects people, places, companies, products, Web pages, videos, and photos. Along with Metaweb's Freebase, Powerset (sold to Microsoft), Hakia, Reuters' Calias, AdaptiveBlue and a few other start-ups, Radar Networks is trying to crack the code on building a piece of the semantic Web.

In a Times Online article, Web creator Tim Berners-Lee gave an example of how the semantic Web would work:

"Imagine if two completely separate things--your bank statements and your calendar--spoke the same language and could share information with one another. You could drag one on top of the other and a whole bunch of dots would appear showing you when you spent your money."

Twine won't provide that futuristic capability but it attempts to build a "semantic graph" of relationships between content, tags, people and Twines (the collection of items of an individual or group on the service). Each piece of content is a "semantic object," Radar Networks CEO Nova Spivack said, using Twine's underlying ontology and database, which applies semantic technologies such as RDF for storing data.

Spivack told me that public Twines are now visible to visitors to the site and to search engines. So far in the beta phase nearly 15,000 Twines have been created and 354,000 pieces of user-contributed content have been added into the system. More than 50,000 users signed up (34,000 are active) for the service, spending 13 to 15 minutes per session on the site, he said.

A major new release of the Twine platform is slated for release in the fall to address shortcomings and introduce new features. "We have worked on a lot of simplification, reducing the clutter, and we still need to reduce more. Twine has a lot of powerful features nobody uses, so we are moving some of the advanced features out of the way," Spivack said. "The fall release will bring more intelligence and semantics to the surface. For example, we will let anyone define a new type of thing, such as a recipe or baseball team form, to author. It's more like what Freebase does, and we will also likely integrate with Freebase over time."

In addition, performance improvements and algorithms to improve search as well as mining and crawling content are in the works. "A major focus of our work is on personalization and recommendations," Spivack said. "Ultimately, Twine is about 'interest networking' and is a content distribution network. People declare their interests, add content, join Twines and connect with people. As users work with the system it learns about their interests, using artificial intelligence and semantic Web technologies to provide more relevance. We are not attempting to index the whole Web, just the best stuff of interest to users. Ninety-nine percent of what's on the Web is not interesting to a user, so it's more about high signal to noise."

On the business front, Spivack believes that Twine can be an intermediary for users, delivering more targeted marketing messages in addition to content. It's similar to the way Facebook is creating a new kind of environment for advertising based on knowing member interests and their social or semantic graph. "The goal for Twine is to be the place on the Web that best understands your interests and represents them to others. The key is to give users control and privacy," Spivack said.

Twine is a work in progress. It's ambitious and has the potential to demonstrate how a more semantic Web could benefit users. The biggest challenge will be scaling the back-end infrastructure and attracting users, which means Twine will have to become far more easy to configure and use. We'll see in the coming months whether the forthcoming changes to Twine help open the floodgates.

Updated numbers on users and usage, 6:30 AM PST, August 1

April 16, 2008 7:15 AM PDT

On the road to the Semantic Web

by Dan Farber
  • 1 comment

The Semantic Web has been just around the corner for a few years. It turns out that bringing a semantic layer of metadata to the Internet is like climbing a mountain in flip-flops.

Tuesday night, Semantic Web mountain climbers Powerset, Radar Networks, and Metaweb participated in a salon at Powerset's San Francisco office, where I talked with them about their product plans.

Powerset gives wings to Wikipedia
I got a preview of Powerset's search engine, which is due to go into beta in the coming weeks, according to co-founder and CTO Barney Pell and as reported by TechCrunch.

Powerset differs from Google and other mainstream search engines in that it linguistically parses sentences, finding subjects, verbs, objects, synonyms, and other elements using a highly sophisticated, language-independent parser licensed from Xerox PARC).

Powerset then extracts and indexes concepts, relationships, and meanings, rather than keywords. (I wrote about Powerset when it first came out of stealth mode, in June 2007.)

Rather than trying to boil the search ocean, compete with Google, and deal with spam and 20 billion documents, Powerset has focused its initial efforts on giving wings to the 3 million pages of Wikipedia.

Hakia's semantic search engine also indexes Wikipedia and other sources. However, Powerset returns a more comprehensive dossier of results for queries, based on deep analysis of Wikipedia pages and other content, and also provides new ways to navigate and discover facts on the individual Wikipedia pages. More details to come when Powerset officially launches its public beta version.

Powerset plans to index the Web at some point (at a significant cost, in terms of servers and bandwidth). For now--or more precisely, when the company allows the public access to its technology--Wikipedia users will be the beneficiaries of a powerful semantic index and user experience.

True Knowledge
I also got a look at True Knowledge's search engine. Company CEO William Tunstall-Pedoe said the search engine is in private beta for now, with about 7,000 users.

Unlike Powerset and other search engines, Cambridge, England-based True Knowledge is building its own knowledge base. Users input facts, as in Wikipedia, but in a more structured manner. In addition, True Knowledge imports data from sources, including Wikipedia, in the form of discrete facts, such "Sacramento is the capital of California."

Queries, including those in natural language, are parsed for machine reading, and they access the repository of facts accumulated. True Knowledge can make inferences, such as in the following example.

(Credit: True Knowledge)

The capability to infer truths based on the data repository would be a welcome feature for Wikipedia, which doesn't have an automated method for dealing with contradictions.

Barney Pell (Powerset), William Tunstall-Pedoe (True Knowledge), Nova Spivack (Radar Networks), Paul Davison (Metaweb)

(Credit: Dan Farber/CNET News)

Metaweb
Another San Francisco Semantic Web start-up, Metaweb, was also a participant in the salon. The company's Freebase is more similar to True Knowledge than Powerset.

Freebase is an community-built database with a large corpus of open data sets, including Wikipedia and MusicBrainz. Powerset includes some Freebase-structured content in its index, and True Knowledge could add Freebase data to its knowledge repository.

Radar Networks' Twine
I also chatted with Nova Spivack, co-founder and CEO of Radar Networks. His company created Twine, an application combining bookmarking, blogging, and RSS reading, with an underlying semantic engine to tie the pieces of data together.

Spivack said Twine has about 7,000 users in private beta, as well as 40,000 standing in line for access. Half of the users have created private Twines, with corporations and closed communities of interest using the service for collaboration.

Major enhancements are planned for the summer and fall, including allowing for complete customization of the user interface. "We have only surfaced a bit of the platform so far. Twine as a platform will integrate with other applications, such as blogs, catalogs, social communities, and corporate sites," he told me.

"It's an enormous multiyear project," Spivack said. It's not like a Google beta or a 1.0 version masquerading as a beta." The same could be said of the other Semantic Web services in the room. It's going to be a very long beta cycle.

March 11, 2008 10:55 PM PDT

Tim Berners-Lee: Google could be superseded by the Semantic Web

by Dan Farber
  • 6 comments

The inventor of the World Wide Web, Sir Tim Berners-Lee, isn't satisfied living on his past laurels. At every opportunity he talks up the Semantic Web, which he calls the "Web of the future."

In a recent article in the Times Online, he said that what Google has done so far pales in comparison with what the Semantic Web will bring. Social -networking leaders Facebook and MySpace will eventually be trumped by networks that connect all types of things, not just people, he said. To be clear, he wasn't saying that Google is doomed.

In the Times Online article, Berners-Lee gave an example of how the Semantic Web would work:

"Imagine if two completely separate things--your bank statements and your calendar--spoke the same language and could share information with one another. You could drag one on top of the other and a whole bunch of dots would appear showing you when you spent your money."
"If you still weren't sure of where you were when you made a particular transaction, you could then drag your photo album on top of the calendar, and be reminded that you used your credit card at the same time you were taking pictures of your kids at a theme park. So you would know not to claim it as a tax deduction."

Google's technology and approach to parsing the Web is based on statistical analysis of incredibly vast amounts of data. The Semantic Web involves creating a layer of metadata that enables rich connections between any type or piece of data.

In 2006, Peter Norvig, Google's director of research, noted some challenges to building a Semantic Web, such as creating the metadata, agreeing on standards, and gaming the system.

"We deal with millions of Web masters who can't configure a server, can't write HTML. It's hard for them to go to the next step. The second problem is competition. Some commercial providers say, 'I'm the leader. Why should I standardize?' The third problem is one of deception. We deal every day with people who try to rank higher in the results and then try to sell someone Viagra when that's not what they are looking for. With less human oversight with the Semantic Web, we are worried about it being easier to be deceptive."

Peter Norvig, Google director of research

However, Norvig does envision a Web of connections far down the road. In a New Scientist article projecting into the future he stated:

In 50 years the scene will be transformed. Instead of typing a few words into a search engine, people will discuss their needs with a digital intermediary, which will offer suggestions and refinements. The result will not be a list of links, but an annotated report (or a simple conversation) that synthesises the important points, with references to the original literature. People won't think of "search" as a separate category--it will all be part of living.

The digital intermediary Norvig mentioned will be informed by semantic metadata, and search engines will take advantage of semantic metadata to deliver more precise and richer results.

Building Semantic Web applications has proven to be challenging so far. For example, Radar Networks just released a public beta of Twine, a personal information manager that uses Semantic Web technology, such as RDF (Resource Description Framework). With Twine, Radar Networks is trying to unleash the "semantic graph," which turns people, places, companies, products, Web pages, videos, photos and other data into Semantic Web content, according to Nova Spivack, CEO of the company.

Twine has met with some early criticism.

In response to the critique, Spivack wrote, "Twine is already far and beyond what any other semantic app I know of is capable of, but that still isn't good enough. We have to push further and focus more on usability. We are opening it up early in order to get feedback and more help testing and guiding the direction of the app from users."

"Ultimately, we will be the category killer for bookmarking, taking notes and organizing information," Spivack proclaimed to me in conversation today, noting that reaching a level of success was "definitely going to take time."

Radar Networks is not alone in trying to turn Semantic Web concepts into usable products. Other startups, such as Freebase, Powerset, Hakia, Blue Organizer, Wikia and Reuters' Calais, face a similar uphill climb to gain adoption.

What's evident is that Berners-Lee continues to be ahead of the curve. Just as the Internet was in gestation for decades, creating a semantic layer at the core of the Web will take decades to evolve.

February 25, 2008 6:54 AM PST

Radar Networks takes $13 million, readies Twine for the public

by Dan Farber
  • 1 comment

Radar Networks is prepping for a March public beta of Twine, a Web application that organizes information into a "semantic graph," connecting people, places, companies, products, Web pages, videos, and photos, and turning it into Semantic Web content.

Nova Spivack

(Credit: Radar Networks)
In addition, the company raised $13 million in Series B funding from Velocity Interactive, Draper Fisher Jurvetson, and Vulcan Capital. The new capital will go toward building out the back-end infrastructure, which can be substantial as Semantic Web applications process and store large amounts of data, as well as adding staff as the business scales up, says Radar Networks founder Nova Spivack said. The company raised $5 million in Series A funding in April of 2006 from Vulcan Capital, Leapfrog, and angel investor Ron Conway.

Twine has been in private beta with a few hundred users since November 2007. "We have 30,000 users on a wait list, and we will let them in 1,000 at a time in our first week in the market," Spivack said. "The next phase will give us tons of feedback, and we will continue to fix things and add new features, but a lot of it is there already and you can get a feel for where it is headed."

"Twine is a new service for knowledge networking, sharing, organizing and in finding information from people you trust," Spivack explained when the application was first introduced in October 2007. "Unlike a social network that is about who you know, Twine is more about what you know."

He also describes Twine as "Web 2.0 with a brain," and as a milestone in making the Semantic Web useful to end users. (See my earlier post on Twine.)

Twine is similar in concept to Facebook and other services that aggregate relevant feeds and notifications. Twine categorizes people, places, organizations, and other concepts.

(Credit: Radar Networks)

Twine will be ad-supported, with limits on storage and the number of advanced features for the free version. A subscription-based, premium-content service is also planned.

Twine isn't the first application to apply Semantic Web principles, extracting meaning, and classifying and relating data with or without using Semantic Web standards such as RDF, OWL and SPARQL (the query language for RDF).

AdaptiveBlue's BlueOrganizer, for example, knows about thinks like music, books, wine and travel destinations, but doesn't use RDF or other Semantic Web standards. Metaweb Technologies' Freebase is a like an open public almanac that includes structured information on topics such as movies, music, people and locations./p>

See also Paul Miller's ZDNet take on Radar Networks' news.

Originally posted at News Blog
  • prev
  • 1
  • next
advertisement
Click Here

E-tailers linked to 'scam' blame customers

Priceline, Classmates.com, and Orbitz say customers should read the fine print before complaining about being charged to join loyalty programs they didn't want.

The 411 on early-termination fees

Verizon Wireless has doubled its early-termination fees for smartphones, but what does it mean for the rest of the industry?

About Outside the Lines

Dan Farber is the editor in chief of CNET News. He has covered technology for more than two decades, and he previously served as editor in chief of ZDNet, PC Week and MacWeek. Outside the Lines explores the intersection of business and technology.

Add this feed to your online news reader

Outside the Lines topics

Subscribe to the EIC² podcast

Editors Dan Farber of News.com and Larry Dignan of ZDNet, square off in EIC² in this weekly podcast. The two editor in chiefs talk about the big tech stories of the day and provide insight and analysis.

Subscribe to this podcast using an RSS reader other than iTunes

Subscribe to this podcast using iTunes

Most Discussed



advertisement

Inside CNET News

Scroll Left Scroll Right