Open Provenance Model Discussion
From Semantic Portal Wiki
James Michaelis' presentation on the Open Provenance Model (OPM) described a developing standard for modeling provenance. The purpose of the OPM is to reconcile and consolidate the disparate efforts of a variety of researchers. However, the paper did not review the provenance of the provenance research that culminated with the decision to create the OPM. This page was created in an effort to fill that gap by accumulating evidence to answer the following questions.
Contents |
Questions
what applications need provenance?
Mostly scientific (see "A Survey of Data Provenance in e-Science" [1])
experiment reproducibility.
- Protein Compressibility Experiment (groth2005recording)
- warehousing environment (cui2000tracing)
- scientific data processing (brose2005recording)
Session 2 of IPAW 06: http://www.ipaw.info/ipaw06/programme.html
A Survey of Data Provenance in e-Science [2]
"The perceived importance of data lineage has grown in step with the increased volume and widened dissemination of processed data sets Woodruff1997supporting."
Geographic Information Systems, Material engineers, biological/biomedical data, intellectual property (simmhan2005survey)
Miles et al [8] study use cases for recording provenance in e-science experiments:
S. Miles, P. Groth, M. Branco, and L. Moreau, "The requirements of recording and using provenance in e-Science experiments," in Technical Report: Electronics and Computer Science, University of Southampton, 2005.
genomics -
- H. V. Jagadish and F. Olken, "Database Management for Life Sciences Research," in SIGMOD Record, vol. 33, 2004, pp. 15-20.
- H. Müller and F. Naumann, "Data Quality in Genome Databases," in IQ, 2003, pp. 269-284.
what benefits are there for the use of provenance?
"it gives a trace"
Provenance-Based Validation of E-Science Experiments: "After an experiment has been executed, it is useful for a scientist to verify that the execution was performed correctly or is compatible with some existing experimental criteria or standards. Scientists may also want to review and verify experiments performed by their colleagues. There are no existing frameworks for validating such experiments in today’s e-Science systems." [3]
how many provenance models are in use today?
Session 6 of IPAW 06: http://www.ipaw.info/ipaw06/programme.html
PReServ: Provenance Recording for Services [4]
PReP, the Provenance Recording Protocol [5]
Trio, its data model TDM, and its query language TriQL (widom2005trio)
scientific data standards (e.g., the Spatial Data Transfer Standard [9], the Spatial Archive and Interchange Format [13], and the draft Content Standard for Digital Spatial Metadata [5]) generally incorporate some kind of support for lineage (woodruff1997supporting).
Annotated bibliography
[11] K. Renaud, "Data Provenance and Annotation Resource Home Page http://www.dcs.gla.ac.uk/~karen/Provenance," Department of Computer Science, University of Glasgow, 2005.
[12] "eBank UK study of provenance http://www.ukoln.ac.uk/projects/ebank-uk/provenance," eBank UK, 2005.
IPAW 06 and 08
First and Second Provenance Challenges

