CAMBRIDGE, Mass.--Harvard Law and Berkman Center scholar Yochai Benkler and Wikipedia founder Jimmy Wales deconstructed Wikipedia and discussed peer production models at an event here Thursday.
Benkler, who is the Jack N. and Lillian R. Berkman Professor of Entrepreneurial Studies at the Harvard Law School and co-director of the Berkman Center, were participating in a program marking the Berkman Center's 10th anniversary at the Harvard Law School (see my earlier coverage of the conference). Wales is a Berkman Fellow and hopes to find ways for groups to come to better decisions in his research.
Jimmy Wales: Given enough time. humans will screw up Wikipedia just as they have screwed up everything else, but so far it's not too bad.
(Credit: Dan Farber)During his remarks, Wales outlined what makes Wikipedia different in light of the perception that world's most-relied-upon information resource is counterintuitive. The following are notes from his remarks from the session (in his voice):
There were a lot of mistakes made in the early social design of the Internet. The unmoderated Usenet groups were difficult to control and exclude bad behavior. It gave the Internet a bad name in some circles, leading to spam, trolls and flamewars, and still exists today.
Given that background, and seeing the worst brought out in people, the community has no means to self-regulate. You end up with the top-down police state to manage it.
The idea that anyone could edit anything at any time made obvious that most people were horrible and it makes the Internet worse. I've learned the analogy to a restaurant. You've been given the task to design a restaurant and serve steaks. So customers have access to knives, and people with knives might stab people, so you need to keep people in a cage. This model makes a bad society, and its view of human nature we mostly avoid except at the airport.
People get the idea the only way to design a social space is to have top-down control. Wikipedia is more like a restaurant--people go in and eat and don't start stabbing people, and there are tools and institutions to deal with misbehavior.
The technology allows us to have a space that is safe and you can block the worst offenders. But how does neutrality fit into this?
Neutrality Point of View (NPOV) is absolute and non-negotiable in Wikipedia. The problems come up in obscure topics, such as Japanese anime. For common topics people come together and make a decent statement on what it is. It turns out what is really important is that participants have a shared vision of what they want to accomplish.
Mutually-assured destruction is inherent in Wikipedia. People who want to push an agenda end up having to write "for the enemy" rather than to those who share the same bias. Most people are pretty reasonable, but you don't get that sense from TV where they put up two people on opposite sides. Most people are in the middle and aware of pros and cons of issues.
We are really strongly focused on consensus. One criticism of Wikipedia is that the majority rules. But majority is not the right way to describe it. We strive for consensus rather than majority rule. If you are in a group of five or ten people working on an article, if 30 percent of those working on the article dissent, then continue to write the article until all but the most unreasonable agree. Those who continue to disagree are typically exhibiting non-collaborative, and sometimes abusive, behavior.
As Wikipedia has grown, there are subcommunities and a risk of changing interactions from small group to more atomistic random people. It's a lot harder to maintain civility. It's a lot harder to be rude to people you know.
Companies working on their own entries are mostly overblown. We see that a lot more from small mom-and-pop companies trying to get an article on Wikipedia. A lot of communications professional understand that interacting with social media requires accepting the norms.
I definitely think we have a problem with the amount of tradition and jargon. People trying to change their biographies, for example, found their changes were reverted with strange codes as an explanation. People should not be required to become expert Wikipedians to join the conversation. It gets really hard when there is too much jargon.
How does Wikipedia decide what is published? The community decides on a case-by-base basis. Wikipedia has gotten bigger in two ways--sheer size of the work, which means when we started out we were covering George Bush and Michael Jackson and they are so famous they don't care. Secondly, we have become very powerful in search engines like Google, so it actually matters to people. Because of those two factors, it's becoming much more on the minds of the community to say how to thoughtfully reflect on question. We look for reliable sources--verifiability. Someone could start an article about their mother, but if they are not well known, who can verify it, so we can't have an article about it. We also look at the question of human dignity. One of the rules of biographies is that if the person is only notable for one event, and perhaps did some bad thing, and it's a unique odd event, we typically try to have an article about the event, not the person.
Intentional vulnerability is really important. Sometimes it's reported that a Wikipedia page was hacked, which we chuckle at. The advanced computer skills to hack Wikipedia are not much. Put a curse word on Wikipedia and we fix it in one or two seconds so it's not that thrilling. We do actually lock the front page though.
Wikia and Wikipedia are completely separate. The only link between the two is that Wales is the founder. Like Mozilla (which makes money from Firefox) Wikipedia could follow that model, but nobody is thinking about it that much.
Given enough time humans will screw up Wikipedia just as they have screwed up everything else, but so far it's not too bad.
Yochai Benkler: We are building systems loosely coupled because we can't design perfect systems.
(Credit: Dan Farber)As you would expect, Benkler took a more academic approach to deconstructing Wikipedia. "Ten years ago we would not have had this conversation," he said, referring to the rapid changes in the last decade. "We are moving generationally from 90s of imagining the world and projecting hopes and fears to a more detailed analysis, moving beyond hoping to organizing our research and getting large scale data and new modes of analysis."
The author of The Wealth of Networks, Benkler said he had been studying Wikipedia since it was four months old. He compared it to the Encyclopedia Britannica, which represents the "structure of authority over knowledge," rather that the process of conversation and human interaction as in Wikipedia. He noted that Wikipedia has moved from being quirky and on the side to something that is mainstream.
"Encyclopedia Britannica is a stable view of knowledge embedded in a human relation and legal system. It was challenged by a much more loosely coupled system that allows for much greater change and unpredictability, and requires more learning and critique," Benkler said. It requires the freedom to change, the will to engage and a certain cooperation dynamic, he added.
Benkler concluded that a very different model of human motivation is needed, that is much more capable of cooperation. It will require looking at many disciplines, including experimental economics, game theory, and organization sociology. "We are building systems loosely coupled because we can't design perfect systems. We have to allow freedom as a practical human agency designed for cooperation to replace rational actor model with something much more rich and close to way the conversation is," Benkler said.
Amid speculation that Microsoft is looking to make an acquisition, Powerset launched a public beta of its Wikipedia search engine. It brings a new, rich semantic dimension via natural language query processing to Wikipedia that greatly improves the search and reading experience.
The company calls it a first step in changing the way users search and consume Web content. "It's a complete shift. You see this and you want to experience all content in this way," Barney Pell, co-founder and CTO of Powerset, told me. "And, as an introduction, it will drive huge investment in semantic and linguistic technology, just as investments were made in information retrieval and scalable databases in the past. People working in this space will be very marketable."
Users can enter keywords, phrases, or simple questions in Powerset's search box. Like many Web startups, Powerset is currently free of advertising.
Powerset's natural language search technology is based on patents licensed exclusively from PARC and its own proprietary indexing. Powerset's engine has read 2.5 million Wikipedia pages and extracted "meaning" from the sentences, creating a navigation and semantic layer on top of the popular Web encyclopedia. Following is a pictorial tour of Powerset features:
Powerset has also indexed Freebase, Metaweb's evolving, open database of structured information. The search result page presents Factz, a summary of key information extracted from Wikipedia pages.
Factz can be expanded to display more of the extracted verbs and their associated words and concepts.
Powerset creates a summary of information, or Dossier, on the right side of the page with Freebase and Wikipedia to give users a quick outline view about a topic. Clicking on an item takes the user to the location in the article and highlights the reference.
Powerset generates a summary of the key Factz to create a kind of Cliff's Notes version of Wikipedia article. Clicking on a summary item takes the user to the reference location in the article and highlights the key words. Powerset also includes a page for disambiguation of queries.
Powerset also shows a tag cloud of things and actions found by its linguistic analysis engine on the page. Clicking on a word shows related Factz in the outline.
Powerset can provide direct answers to queries from its Wikipedia and Freebase index, and highlight the most relevant search results based on the meaning of the query. Hakia, another semantic search engine, as well as Google can also surface the date Picasso was born at the top of their results pages.
Powerset's Wikipedia search engine isn't going to slow down the Google in the near term, but it will raise the bar on the search experience for all players. "There are implications beyond Wikipedia," Pell said. " Search is not done. You can see the emerging Semantic Web with our integration of Wikipedia and Freebase. We will add other components with structured data and ways to answers questions."
Powerset has said that the longer term plan is to read, linguistically analyze and index 20 billion documents on the Web, which will be a costly and ambitious undertaking. (Getting acquired by Microsoft would be helpful for that project. Powerset has received $12.5 million in Series A funding from Foundation Capital, Founders Fund, and angel investors in 2006.)
Bret Taylor, formerly of Google and now of FriendFeed, has a greater appreciation for the business development function. In a post today he wrote about the challenges of getting legal access to factual data--such as mapping, stock quotes, white pages, TV schedules, movie show times, and sports scores--for use in applications.
If you want to experiment with a new driving directions algorithm, it is infinitely more difficult than coming up with an algorithm; you have to hire a lawyer and a sign a contract with a company that collects that data in the country you are developing for.
Bret Taylor: Free the data
He adds that some of the data has quality problems or is incomplete. In sum, Taylor believes that innovation is stymied and the barrier to entry is raised in the current environment. It's not just the need for lawyers and contracts but also the issue of companies that sell data restricting use.
What the solution to freeing up the data? Taylor advocates open-sourcing factual data, and competing on use of the data, not access to it. He wrote:
To this end, I think we should create a Wikipedia for data: a global database for all of these important data sources to which we all contribute and that anyone can use. When a user reports an inaccurate phone number in your products, save it back to the DataWiki so everyone can benefit, and in return, you get everyone else's improvements as well. If your local movie theater doesn't have listings data in DataWiki, you can type it in yourself, and everyone in your town can benefit, and all the products you use that access movie listings will automatically update. Need better mapping data for a city? Pay to collect it, and upload it to the DataWiki. In return you get all the other cities other companies paid for (sort of like a company contributing device drivers to the Linux kernel).
For centuries, companies have made money in exchange for doing the busy work of collecting, massaging, and publishing factual data. The same was true for encyclopedia data until recently. Taylor is definitely onto something, but it presents some real data collection challenges. The open-source community is sure to take up the challenge.
The question is, will the companies that already have the data be of assistance? It's not exactly in their best financial interest to give away their content, but the example of Wikipedia should give them the incentive to press the pause button.
See also: Sarah Perez discusses where to find open data on the Web, such as CKAN (Comprehensive Knowledge Archive Network), OpenStreetMap and Freebase.
- prev
- 1
- next





