Provenance
From Semantic Portal Wiki
| Infobox (Survey) edit with form |
|---|
|
|
Contents |
Overview
The process that led to some data is called the provenance of that data. A provenance architecture is the software architecture for a system that will provide the necessary functionality to record, store and use process documentation to determine the provenance of data items.
"The motivation for understanding the provenance of works of art is also also applicable to data we see on the Web. With the proliferation of data on the Web, questions such as Where did this data come from?, Who else is using this data?, and Why is this piece of data here? are becoming increasingly common" (Tan 2004).
"Data provenance, one kind of metadata, pertains to the derivation history of a data product starting from its original sources. It is a moot point on where the boundary between provenance information and generic metadata lies. In some cases, there is little to distinguish the two and provenance is subsumed into the general metadata infrastructure." (Simmhan et al. 2005)
Research Themes
Workflow Provenance
Workflow provenance has emerged as an important consideration in e-science (Lanter 1990; Frew and Bose 1991) and the grid community (Foster et al. 2002; Muniswamy-Reddy et al. 2006; Moreau and Ibbotson 2006). It focuses on the history of dataset derivation at a coarse level of granularity. Workflow provenance is in particular very important in e-science domain and there are quite some requirements emerging (Miles et al. 2007). The increasing interests in provenance metadata from different domains using different technologies have led to several provenance dialects. Interestingly all of the 14 teams in the second provenance challenge used their own (distinct) provenance representations and issues arose during translation. There are some useful surveys (Simmhan et al. 2005; Bose and Frew 2005).
researchers
resources
- EU Provenance Project (service, grid), http://www.gridprovenance.org/
- Zoom Project: http://zoomuserviews.db.cis.upenn.edu/cgi-bin/pmwiki.php
- http://isweb.uni-koblenz.de/Research/MetaKnowledge
- IPAW 2006, http://www.ipaw.info/ipaw06/
- http://wiki.esi.ac.uk/ProvenanceInWorkflows, Symposium on Provenance in Scientific Workflows, October 13-17 2008
references
- David P. Lanter. Lineage in GIS: The Problem and a Solution , NCGIA (90-6), 1990
- James Frew, Rajendra Bose. Earth System Science Workbench: A Data Management Infrastructure for Earth Science Products , SSDBM pp.180-189, 2001
- Ian T. Foster, Jens-S. Vockler, Michael Wilde, Yong Zhao. Chimera: AVirtual Data System for Representing, Querying, and Automating Data Derivation , SSDBM pp.37-46, 2002
- Kiran-Kumar Muniswamy-Reddy, David A. Holland, Uri Braun, Margo I. Seltzer. Provenance-Aware Storage Systems , USENIX Annual Technical Conference, General Track pp.43-56, 2006
- Luc Moreau, John Ibbotson. The EU Provenance Project: Enabling and Supporting Provenance in Grids for Complex Problems (Final Report) , The EU Provenance Consortium, 2006
- Simon Miles, Paul T. Groth, Miguel Branco, Luc Moreau. The Requirements of Using Provenance in e-Science Experiments , J. Grid Comput. 5 (1) pp.1-25, 2007
- Yogesh Simmhan, Beth Plale, Dennis Gannon. A survey of data provenance in e-science , SIGMOD Record 34 (3) pp.31-36, 2005
- Rajendra Bose, James Frew. Lineage retrieval for scientific data processing: a survey , ACM Comput. Surv. 37 (1) pp.1-28, 2005
- Yolanda Gil, Ewa Deelman, Mark H. Ellisman, Thomas Fahringer, Geoffrey Fox, Dennis Gannon, Carole A. Goble, Miron Livny, Luc Moreau, Jim Myers. Examining the Challenges of Scientific Workflows , IEEE Computer 40 (12) pp.24-32, 2007
Protocol for Bioinformatics
bioinformatics process can be considered as a specific branch of workflow provenance.
references
- Lance Feagan, Justin Rohrer, Alexander Garrett, Heather Amthauer, Ed Komp, David Johnson, Adam Hock, Terry Clark, Gerald Lushington, Gary Minden, Victor Frost. Bioinformatics process management: information flow via a computational journal , Source Code for Biology and Medicine 2 (9), 2007
- Joan C. Bartlett, Elaine G. Toms. Developing a protocol for bioinformatics analysis: An integrated information behavior and task analysis approach , Journal of the American Society for Information Science 56 (5) pp.469 - 482, 2005
- Shawn Hoon, Kiran Kumar Ratnapu, Jer-ming Chia, Balamurugan Kumarasamy, Xiao Juguang, Michele Clamp, Arne Stabenau, Simon Potter, Laura Clarke, Elia Stupka. Biopipe: A Flexible Framework for Protocol-Based Bioinformatics Analysis , Genome Research 13 () pp.1904-1915, 2003
Data Provenance (database)
Data provenance has been pioneered by (Buneman et al, 2001; Cui et al. 2000; Woodruff and Stonebraker 1997) within database community. Data provenance research focuses on issues of importance in database settings and has been inspired by computational methods suitable for and facilitated by databases. For example, (why provenance) find source tuples to explain why a tuple is derived, and (where provenance) find the portion of sources which is copied to a portion of the derived tuple. This kind of provenance can be represented as a specialized workflow step whose action with declarative query and declarative inverse-function. There are some useful surveys (Glavic and Dittrich 2007; Tan 2007). It is notable that some data provenance has been generalized to workflow provenance in e.g. e-science while the narrow "data provenance" remain in database domain.
researchers
- Peter Buneman
- Wang Chiew Tan
- Allison Woodruff
- Michael Stonebraker
- Yingwei Cui
- Jennifer Widom
- Janet L. Wiener
resources
- Principles of Provenance (PrOPr), http://www.cis.upenn.edu/~plclub/propr/
- a nice tutorial: http://www.soe.ucsc.edu/~wctan/papers/2007/DBProvenance.ppt
- a survey: http://www.dcs.gla.ac.uk/~karen/Provenance/apps.html
References
- Peter Buneman, Sanjeev Khanna, Wang Chiew Tan. Why and Where: A Characterization of Data Provenance , ICDT pp.316-330, 2001
- Yingwei Cui, Jennifer Widom, Janet L. Wiener. Tracing the lineage of view data in a warehousing environment , ACM Trans. Database Syst. 25 (2) pp.179-227, 2000
- Allison Woodruff, Michael Stonebraker. Supporting Fine-grained Data Lineage in a Database Visualization Environment , ICDE pp.91-102, 1997
- Wang Chiew Tan. Research Problems in Data Provenance , IEEE Data Eng. Bull. 27 (4) pp.45-52, 2004
- Boris Glavic, Klaus R. Dittrich. Data Provenance: A Categorization of Existing Approaches , BTW pp.227-241, 2007
- Wang Chiew Tan. Provenance in Databases: Past, Current, and Future , IEEE Data Eng. Bull. 30 (4) pp.3-12, 2007
Knowledge Provenance (AI)
Knowledge provenance (McGuinness and Pinheiro da Silva 2004; Fox and Huang 2003) focuses on issues of importance in knowledge base settings, which typically includes those of importance in database settings but also includes concerns arising from reasoning (potentially hybrid reasoning). For example, applications may need provenance for results of text analytic programs that are integrated into knowledge bases and processed by first order reasoners (Murdock et al. 2006) Provenance in distributed information systems (Weitzner et al. 2006) is an interesting direction in provenance research. Unlike many e-science workflows that simply compose services in to a sequence, the workflow in such systems involves many interactive communication protocols as well.
References
- Deborah L. McGuinness, Paulo Pinheiro da Silva. Explaining answers from the Semantic Web: the Inference Web approach , Journal of Web Semantics 1 (4) pp.397-413, 2004
- Mark S. Fox, Jingwei Huang. Knowledge Provenance , Canadian Conference on AI pp.517-523, 2004
- J. William Murdock, Deborah L. McGuinness, Paulo Pinheiro da Silva, Christopher A. Welty, David A. Ferrucci. Explaining Conclusions from Diverse Knowledge Sources , Proceedings of the 5th International Semantic Web Conference (ISWC2006) pp.861-872, 2006
- Daniel J. Weitzner, Harold Abelson, Tim Berners-Lee, Chris Hanson, James A. Hendler, Lalana Kagal, Deborah L. McGuinness, Gerald J. Sussman, K. Krasnow Waterman. Transparent Accountable Data Mining: New Strategies for Privacy Protection , Proceedings of AAAI Spring Symposium on The Semantic Web meets eGovernment, 2006
Research Directions
Provenance Metadata
- reference information (aka digital object, statements)
- reference and classify entities involved in information manipulation
- annotate provenance attributes
- represent information manipulation process in terms of plan and log
resources
References
- Luc Moreau, Juliana Freire, Joe Futrelle, Robert E. McGrath, Jim Myers, Patrick Paulson. The Open Provenance Model , University of Southampton, 2007
- Deborah L. McGuinness, Li Ding, Paulo Pinheiro da Silva, Cynthia Chang. PML 2: A Modular Explanation Interlingua , Proceedings of the 2007 Workshop on Explanation-aware Computing (ExaCt-2007), 2007
Provenance Computation
- classify the computation on provenance metadata
- list application domain and scenarios for provenance
- provenance metadata management (storage, access, query)
- provenance aware user interaction
Provenance Systems
Literature Survey
| Dc:creator | Li Ding + |
| Dc:relation | Provenance + |
| Dcterms:created | 2009-05-18 |
| Foaf:name | Provenance |

