• On MovieTome: The final word on Arnold and TERMINATOR!
April 8, 2008 7:42 PM PDT

Open-sourcing factual data, Wikipedia style

by Dan Farber

Bret Taylor, formerly of Google and now of FriendFeed, has a greater appreciation for the business development function. In a post today he wrote about the challenges of getting legal access to factual data--such as mapping, stock quotes, white pages, TV schedules, movie show times, and sports scores--for use in applications.

If you want to experiment with a new driving directions algorithm, it is infinitely more difficult than coming up with an algorithm; you have to hire a lawyer and a sign a contract with a company that collects that data in the country you are developing for.

Bret Taylor: Free the data

He adds that some of the data has quality problems or is incomplete. In sum, Taylor believes that innovation is stymied and the barrier to entry is raised in the current environment. It's not just the need for lawyers and contracts but also the issue of companies that sell data restricting use.

What the solution to freeing up the data? Taylor advocates open-sourcing factual data, and competing on use of the data, not access to it. He wrote:

To this end, I think we should create a Wikipedia for data: a global database for all of these important data sources to which we all contribute and that anyone can use. When a user reports an inaccurate phone number in your products, save it back to the DataWiki so everyone can benefit, and in return, you get everyone else's improvements as well. If your local movie theater doesn't have listings data in DataWiki, you can type it in yourself, and everyone in your town can benefit, and all the products you use that access movie listings will automatically update. Need better mapping data for a city? Pay to collect it, and upload it to the DataWiki. In return you get all the other cities other companies paid for (sort of like a company contributing device drivers to the Linux kernel).

For centuries, companies have made money in exchange for doing the busy work of collecting, massaging, and publishing factual data. The same was true for encyclopedia data until recently. Taylor is definitely onto something, but it presents some real data collection challenges. The open-source community is sure to take up the challenge.

The question is, will the companies that already have the data be of assistance? It's not exactly in their best financial interest to give away their content, but the example of Wikipedia should give them the incentive to press the pause button.

See also: Sarah Perez discusses where to find open data on the Web, such as CKAN (Comprehensive Knowledge Archive Network), OpenStreetMap and Freebase.

Dan Farber is editor in chief of CBS Interactive News, which includes CBSNews.com and CNET News. He has more than 25 years of experience as an editor and journalist covering technology. E-mail Dan.
Recent posts from Outside the Lines
Track business executives' tweets with ExecTweets
Wolfram Alpha: Next major search breakthrough?
Microsoft's Live Mesh top innovation at the Crunchies
Macintosh at 25: Still the innovation leader
Print news is fading, but the content lives on
More speculation on Yahoo's CEO choices
Google's 2008 Zeitgeist lists of most popular searches
The information flow from Mumbai
Add a Comment (Log in or register) (6 Comments)
  • prev
  • 1
  • next
by BradPatrick April 9, 2008 7:12 AM PDT
You might want to check out www.freebase.com (alpha). That is the project that is closest to what Bret is wanting to develop. It is in its infancy, but shares the same OS philosophy and model.
Reply to this comment
by napm1971 April 9, 2008 10:01 AM PDT
Oh there's a lot to say here... I feel a blog post coming on!

In shorthand, though, it's worth checking out the work around appropriate - open - licensing for this data at opendatacommons.org/

I also have a paper at the World Wide Web conference later this week, which digs into the licensing and economic issues a little further... http://events.linkeddata.org/ldow2008/papers/08-miller-styles-open-data-commons.pdf

See also http://blogs.talis.com/nodalities/2007/12/licensing_open_data_creative_c.php for some history of this collaboration between ourselves at Talis, the Science Commons project of Creative Commons, and a pair of very smart lawyers; Jordan Hatcher and Charlotte Waelde.
Reply to this comment
by yum8yuk April 9, 2008 10:04 AM PDT
The man has very nice intentions. But i think there is 1 in a billion that the wiki will force data hogs to give up their hard earned data gathering for free. I know i will never do that. Infact wouldn't many of these data companies go out of business if they gave their data for free? Its no mystery that data gathering is a billion dollar business. If you are trying to update a phone number this may be ok. I wish you Luck brother.
Reply to this comment
by krosavcheg April 9, 2008 2:50 PM PDT
I suggest checking out http://www.numberzoom.com/ which is a wiki of user contributed phone listings for unknown caller IDs. I saw an article on the company in the nytimes.
Reply to this comment
by walwebster April 10, 2008 1:02 AM PDT
I've just spent a few years designing, building and populating some private databases containing the names and certain other relevant details about many thousands of people with certain attributes in common. Now tell me again why I want to give them away for nothing? Other than because you might prefer not to pay for them, that is. It's all public-domain information, after all -- in a free market, you're welcome to get that same data from anyone else who'll put in the same work, over the same length of time, and then sell their output to you at a more attractive price than mine. (I wouldn't be holding my breath till that happens, though ...)

Issues of data quality and completeness tend to reflect the old adage that "you get what you pay for".
Reply to this comment
by thekohser April 10, 2008 6:23 AM PDT
The responsibility of correcting incorrect data about a person (birthdate, current employer, marital status, etc.) or a private enterprise (P/E ratio, movie times, hours of operation) are the responsibility of the PERSON or the ENTERPRISE. It is they who fail to optimize their gains by allowing incorrect data into the marketplace, so it is they who should be concerned about correcting it. Not "volunteer" data geeks, and especially not vandals from Ralph's Pizza who might want to change the hours of operation at Joe's Pizza to "closed Saturdays and Sundays".

Please. Sound, reliable, accurate databases are built on the self-interest of those whose data is represented and on the reputation of the agent who is assembling said data. The model described above isn't going to work.
Reply to this comment
(6 Comments)
  • prev
  • 1
  • next
advertisement
Click Here

Making sense of Windows 7 upgrades

faq The basics and the fine print on Microsoft's options for those eyeing the next operating system from Redmond.
• Full Windows 7 coverage

Road Trip 2009: Big Sky Country

CNET News reporter Daniel Terdiman takes his car full of gadgets to the Rockies and the Great Plains in search of tech, science, nature, and more.
• America's Fortress: Cheyenne Mountain

About Outside the Lines

Dan Farber is the editor in chief of CNET News. He has covered technology for more than two decades, and he previously served as editor in chief of ZDNet, PC Week and MacWeek. Outside the Lines explores the intersection of business and technology.

Add this feed to your online news reader

Outside the Lines topics

Subscribe to the EIC² podcast

Editors Dan Farber of News.com and Larry Dignan of ZDNet, square off in EIC² in this weekly podcast. The two editor in chiefs talk about the big tech stories of the day and provide insight and analysis.

Subscribe to this podcast using an RSS reader other than iTunes

Subscribe to this podcast using iTunes

advertisement
advertisement

Inside CNET News

Scroll Left Scroll Right