Open-sourcing factual data, Wikipedia style
Bret Taylor, formerly of Google and now of FriendFeed, has a greater appreciation for the business development function. In a post today he wrote about the challenges of getting legal access to factual data--such as mapping, stock quotes, white pages, TV schedules, movie show times, and sports scores--for use in applications.
If you want to experiment with a new driving directions algorithm, it is infinitely more difficult than coming up with an algorithm; you have to hire a lawyer and a sign a contract with a company that collects that data in the country you are developing for.

Bret Taylor: Free the data
He adds that some of the data has quality problems or is incomplete. In sum, Taylor believes that innovation is stymied and the barrier to entry is raised in the current environment. It's not just the need for lawyers and contracts but also the issue of companies that sell data restricting use.
What the solution to freeing up the data? Taylor advocates open-sourcing factual data, and competing on use of the data, not access to it. He wrote:
To this end, I think we should create a Wikipedia for data: a global database for all of these important data sources to which we all contribute and that anyone can use. When a user reports an inaccurate phone number in your products, save it back to the DataWiki so everyone can benefit, and in return, you get everyone else's improvements as well. If your local movie theater doesn't have listings data in DataWiki, you can type it in yourself, and everyone in your town can benefit, and all the products you use that access movie listings will automatically update. Need better mapping data for a city? Pay to collect it, and upload it to the DataWiki. In return you get all the other cities other companies paid for (sort of like a company contributing device drivers to the Linux kernel).
For centuries, companies have made money in exchange for doing the busy work of collecting, massaging, and publishing factual data. The same was true for encyclopedia data until recently. Taylor is definitely onto something, but it presents some real data collection challenges. The open-source community is sure to take up the challenge.
The question is, will the companies that already have the data be of assistance? It's not exactly in their best financial interest to give away their content, but the example of Wikipedia should give them the incentive to press the pause button.
See also: Sarah Perez discusses where to find open data on the Web, such as CKAN (Comprehensive Knowledge Archive Network), OpenStreetMap and Freebase.
Dan Farber is editor in chief of CBS Interactive News, which includes CBSNews.com and CNET News. He has more than 25 years of experience as an editor and journalist covering technology. E-mail Dan.





In shorthand, though, it's worth checking out the work around appropriate - open - licensing for this data at opendatacommons.org/
I also have a paper at the World Wide Web conference later this week, which digs into the licensing and economic issues a little further... http://events.linkeddata.org/ldow2008/papers/08-miller-styles-open-data-commons.pdf
See also http://blogs.talis.com/nodalities/2007/12/licensing_open_data_creative_c.php for some history of this collaboration between ourselves at Talis, the Science Commons project of Creative Commons, and a pair of very smart lawyers; Jordan Hatcher and Charlotte Waelde.
Issues of data quality and completeness tend to reflect the old adage that "you get what you pay for".
-
by thekohser
April 10, 2008 6:23 AM PDT
- The responsibility of correcting incorrect data about a person (birthdate, current employer, marital status, etc.) or a private enterprise (P/E ratio, movie times, hours of operation) are the responsibility of the PERSON or the ENTERPRISE. It is they who fail to optimize their gains by allowing incorrect data into the marketplace, so it is they who should be concerned about correcting it. Not "volunteer" data geeks, and especially not vandals from Ralph's Pizza who might want to change the hours of operation at Joe's Pizza to "closed Saturdays and Sundays".
-
Reply to this comment
-
(6 Comments)Please. Sound, reliable, accurate databases are built on the self-interest of those whose data is represented and on the reputation of the agent who is assembling said data. The model described above isn't going to work.