Raw data now
From Semantic Portal Wiki
Contents |
Group Members
Dongwoo Kim
Sheila Kinsella
Oshani Seneviratne
Zhenning Shangguan
Problem Statement
There is so much data locked up in silos and we are trying to formalise how we can convince data providers to open this data to benefit the community as a whole, and how we can make it easy for them.
Motivating Scenario
The economist laments "The problem with today's social networks is that they are often closed to the outside web."
Illustration by David Simonds Reproduced from "Everywhere and nowhere". May 19, 2008. Economist 2008
Potential Benefits
- Increased Visiability: through search engine discoverability (GoodRelations data used within RDFa).
- Transparency: and hence customer loyalty.
- Diverse Added Value: different communities can make different use of the same data to suit different needs.
- For Government/non-profit: other people can do analysis and visualizations and produce interesting results for the data provider, which could be used for decision making.
Potential Concerns
Technical
- Formats - exporters/converters
- Documentation: How to extract meaning out of the data
- Entity resolution - how do we match entities/concepts in order to interlink datasets
- wait for natural convergence?
- a centralised repository e.g. OKKAM
- ID Commons
- FOAF+SSL
- Correctness of Data
- Example: in version 3.2 of DBpedia, the president of US was G.W.Bush, while in a later version 3.3, the president of US should be Barack Obama.
Legal
- Intellectual property rights
- licenses to protect data, e.g. can only use for non-commercial, otherwise can take legal action
Economic
- Will they lose money from this - is Open == Free? For example making the geographical data in the UK obtained through the ordnance survey is hard because of the specific economic model they've adopted in the past
- Trade secrets - will they lose out to competition? and can the competition gain advantage from a company making data available?
Social
- Privacy (for example unforeseen consequences (AOL not anticipating the privacy breaches of releasing their usage data))
- Data is open to abuse - mining to get customer information
- Make sure original source gets credit for the data
- credit includes various aspects: social reputation, economic compensation
Possible Solutions
- Allow the data providers to provide the data in their own format, and let tool builders to develop tools to convert to other formats or develop visualization techniques.
Technical
| Problem | Solution | Responsibility |
|---|---|---|
| Different Formats | Exporters and Converters | Independent tool developers |
| Documentation: How to extract meaning out of the data | Self describing data (apply the software engineering documentation paradigm) | Data providers |
| Entity resolution | Wait for natural convergence | Community driven |
| Correctness of Data | 1) using ChangeSet to specify the delta/difference between different versions. 2) using registration mechanisms to guarantee registered users of the dataset always get "fresh" data. | data providers |
Legal
| Problem | Solution | Responsibility |
|---|---|---|
| Intellectual property, rights | licenses to protect data, e.g. can only use for non-commercial, otherwise can take legal action |
Economic
| Problem | Solution | Responsibility |
|---|---|---|
| Open == Free ? (example: disruptions of the existing economic model (e.g. UK geo data through the ordnance survey)) | Not necessarily free. 1) Working out better business plans/models flexible charging policies (usage-based, capacity-based, purpose-based, etc); 2) flexible control mechanism, e.g., free for trusted third-party think-tank, let them make use of the dataset, and provide financial insights. 3) better visibility in search, e.g. Yahoo SearchMonkey will display directly in search results any product information expressed in RDFa with GoodRelations http://www.mail-archive.com/public-lod@w3.org/msg03000.html | |
| Revealing the data will lose the organization's competitive edge (Shanguan please free to change) | Licensing mechanisms such as Creative Commons enables entities to give "Non-Commercial" use license to their data, whereby a violation (for e.g. stealing a trade secret) will be enforceable in a court of law | Community / Law enforcement agencies |
Social
| Problem | Solution | Responsibility |
|---|---|---|
| Unforeseen consequences like potential privacy concerns | Have different degrees of openness (selective openness)1. Anonymize the data 2. Supply aggregate statistics or derivative facts (Don't supply raw data) 3. Personalize the raw data that you are opening up and let only the data applicable available | Data Providers |
| Making sure original source gets credit for the data | Social Compensation: Attribution as the content creator requests it. | Data providers |
Other solutions:
- Users make enough noise e.g. Facebook UI change
Questions for Discussion
- API or RAW Data: Is having an API same as having open data?
- Net-neutrality debate? Child protection software - metadata validation
Conclusion
- There are existing or feasible ad-hoc solutions to many obstacles
- A coherent framework is needed to combine all of the knowledge from different domains
- How to achieve this is itself a complicated research problem
- Multidisciplinary expertise definitely required
Presentation Slides
http://dig.csail.mit.edu/2009/Talks/0725-RPISS-os
Facts about Raw data nowRDF feed
| Participant | Dongwoo Kim +, Sheila Kinsella +, Oshani Seneviratne +, and Zhenning Shangguan + |
| Slides | Raw Data Now Slides + |
| Title | Raw Data Now + |


