Raw data now

From Semantic Portal Wiki

Jump to: navigation, search

Contents

Group Members

Dongwoo Kim
Sheila Kinsella
Oshani Seneviratne
Zhenning Shangguan

Problem Statement

There is so much data locked up in silos and we are trying to formalise how we can convince data providers to open this data to benefit the community as a whole, and how we can make it easy for them.

Motivating Scenario

File:Walled garden.jpg

The economist laments "The problem with today's social networks is that they are often closed to the outside web."

Illustration by David Simonds Reproduced from "Everywhere and nowhere". May 19, 2008. Economist 2008

Potential Benefits

  • Increased Visiability: through search engine discoverability (GoodRelations data used within RDFa).
  • Transparency: and hence customer loyalty.
  • Diverse Added Value: different communities can make different use of the same data to suit different needs.
  • For Government/non-profit: other people can do analysis and visualizations and produce interesting results for the data provider, which could be used for decision making.

Potential Concerns

Technical

  • Formats - exporters/converters
  • Documentation: How to extract meaning out of the data
  • Entity resolution - how do we match entities/concepts in order to interlink datasets
    • wait for natural convergence?
    • a centralised repository e.g. OKKAM
    • ID Commons
    • FOAF+SSL
  • Correctness of Data
    • Example: in version 3.2 of DBpedia, the president of US was G.W.Bush, while in a later version 3.3, the president of US should be Barack Obama.

Legal

  • Intellectual property rights
    • licenses to protect data, e.g. can only use for non-commercial, otherwise can take legal action

Economic

  • Will they lose money from this - is Open == Free? For example making the geographical data in the UK obtained through the ordnance survey is hard because of the specific economic model they've adopted in the past
  • Trade secrets - will they lose out to competition? and can the competition gain advantage from a company making data available?

Social

  • Privacy (for example unforeseen consequences (AOL not anticipating the privacy breaches of releasing their usage data))
    • Data is open to abuse - mining to get customer information
  • Make sure original source gets credit for the data
    • credit includes various aspects: social reputation, economic compensation

Possible Solutions

  • Allow the data providers to provide the data in their own format, and let tool builders to develop tools to convert to other formats or develop visualization techniques.

Technical

Problem Solution Responsibility
Different Formats Exporters and Converters Independent tool developers
Documentation: How to extract meaning out of the data Self describing data (apply the software engineering documentation paradigm) Data providers
Entity resolution Wait for natural convergence Community driven
Correctness of Data 1) using ChangeSet to specify the delta/difference between different versions. 2) using registration mechanisms to guarantee registered users of the dataset always get "fresh" data. data providers

Legal

Problem Solution Responsibility
Intellectual property, rights licenses to protect data, e.g. can only use for non-commercial, otherwise can take legal action

Economic

Problem Solution Responsibility
Open == Free ? (example: disruptions of the existing economic model (e.g. UK geo data through the ordnance survey)) Not necessarily free. 1) Working out better business plans/models flexible charging policies (usage-based, capacity-based, purpose-based, etc); 2) flexible control mechanism, e.g., free for trusted third-party think-tank, let them make use of the dataset, and provide financial insights. 3) better visibility in search, e.g. Yahoo SearchMonkey will display directly in search results any product information expressed in RDFa with GoodRelations http://www.mail-archive.com/public-lod@w3.org/msg03000.html
Revealing the data will lose the organization's competitive edge (Shanguan please free to change) Licensing mechanisms such as Creative Commons enables entities to give "Non-Commercial" use license to their data, whereby a violation (for e.g. stealing a trade secret) will be enforceable in a court of law Community / Law enforcement agencies

Social

Problem Solution Responsibility
Unforeseen consequences like potential privacy concerns Have different degrees of openness (selective openness)1. Anonymize the data 2. Supply aggregate statistics or derivative facts (Don't supply raw data) 3. Personalize the raw data that you are opening up and let only the data applicable available Data Providers
Making sure original source gets credit for the data Social Compensation: Attribution as the content creator requests it. Data providers


Other solutions:

  • Users make enough noise e.g. Facebook UI change

Questions for Discussion

  1. API or RAW Data: Is having an API same as having open data?
  2. Net-neutrality debate? Child protection software - metadata validation

Conclusion

  • There are existing or feasible ad-hoc solutions to many obstacles
  • A coherent framework is needed to combine all of the knowledge from different domains
  • How to achieve this is itself a complicated research problem
  • Multidisciplinary expertise definitely required

Presentation Slides

http://dig.csail.mit.edu/2009/Talks/0725-RPISS-os

Personal tools
Semantic Web Community
Tetherless World constellation
maintenance