Provenance

From Semantic Portal Wiki

Jump to: navigation, search
Infobox (Survey) edit with form

  • description: A survey of provenance research frontier
  • creator(s): Li Ding
  • created: 2009/05/18
  • relation(s): Category:Provenance
  • modified: 2010-3-28
Note: This is an incomplete survey of provenance related research. You can contact Li Ding or register on this wiki to improve it.


Contents

Overview

For general definition of provenance, see wikipedia article wikipedia:Provenance. Following are some definitions from several publications:

  • The process that led to some data is called the provenance of that data. A provenance architecture is the software architecture for a system that will provide the necessary functionality to record, store and use process documentation to determine the provenance of data items. ( Miles et al. 2007)
  • The motivation for understanding the provenance of works of art is also also applicable to data we see on the Web. With the proliferation of data on the Web, questions such as Where did this data come from?, Who else is using this data?, and Why is this piece of data here? are becoming increasingly common.( Tan 2004).
  • Data provenance, one kind of metadata, pertains to the derivation history of a data product starting from its original sources. It is a moot point on where the boundary between provenance information and generic metadata lies. In some cases, there is little to distinguish the two and provenance is subsumed into the general metadata infrastructure. ( Simmhan et al. 2005)

Research Themes

Workflow Provenance

Workflow provenance has emerged as an important consideration in e-science (Lanter 1990; Frew and Bose 1991) and the grid community (Foster et al. 2002; Muniswamy-Reddy et al. 2006; Moreau and Ibbotson 2006). In order to address requirements from e-science areas (Miles et al. 2007), workflow provenance research focuses on process by recording the history of data derivation . The increasing interests in workflow provenance from different domains using different technologies have led to several provenance representation dialects, for example, the 14 teams in the second provenance challenge used their own (distinct) provenance representations and it was difficulty to integrate provenance metadata in different representation. For more general overview of workflow provenance, see some useful surveys (Simmhan et al. 2005; Bose and Frew 2005).

researchers
resources
references

REF general {{#vardefine:pagename|a survey of data provenance in e-science }}

  1. [[]]{{#vardefine:pagename|lineage retrieval for scientific data processing: a survey }}
  2. [[]]{{#vardefine:pagename|provenance-aware storage systems }}
  3. [[]]{{#vardefine:pagename|the eu provenance project: enabling and supporting provenance in grids for complex problems (final report) }}
  4. [[]]{{#vardefine:pagename|the requirements of using provenance in e-science experiments }}
  5. [[]]{{#vardefine:pagename|examining the challenges of scientific workflows }}
  6. [[]]{{#vardefine:pagename|chimera: a virtual data system for representing, querying, and automating data derivation }}
  7. [[]]

REF domain (GIS) {{#vardefine:pagename|earth system science workbench: a data management infrastructure for earth science products }}

  1. [[]]{{#vardefine:pagename|lineage in gis: the problem and a solution }}
  2. [[]]

REF domain (BIO) - bioinformatics process is a specific branch of workflow provenance. {{#vardefine:pagename|bioinformatics process management: information flow via a computational journal }}

  1. [[]]{{#vardefine:pagename|biopipe: a flexible framework for protocol-based bioinformatics analysis }}
  2. [[]]{{#vardefine:pagename|developing a protocol for bioinformatics analysis: an integrated information behavior and task analysis approach }}
  3. [[]]

Data Provenance (database)

Data provenance has been pioneered by (Buneman et al, 2001; Cui et al. 2000; Woodruff and Stonebraker 1997) within database community. Data provenance research focuses on issues of importance in database settings and has been inspired by computational methods suitable for and facilitated by databases. For example, why provenance finds source tuples to explain why a tuple is derived, and where provenance finds the portion of sources which is copied to a portion of the derived tuple. This kind of provenance can be recognized as a specialized workflow step whose action can be recorded by declarative query and declarative inverse-function. It is notable that some data provenance has been generalized to workflow provenance in e.g. e-science while the narrow "data provenance" remain in database domain. For more detailed overview, see some useful surveys (Glavic and Dittrich 2007; Tan 2007).

researchers

resources

References {{#vardefine:pagename|why and where: a characterization of data provenance }}

  1. [[]]{{#vardefine:pagename|tracing the lineage of view data in a warehousing environment }}
  2. [[]]{{#vardefine:pagename|supporting fine-grained data lineage in a database visualization environment }}
  3. [[]]{{#vardefine:pagename|research problems in data provenance }}
  4. [[]]{{#vardefine:pagename|data provenance: a categorization of existing approaches }}
  5. [[]]{{#vardefine:pagename|provenance in databases: past, current, and future }}
  6. [[]]

Knowledge Provenance (AI)

Knowledge provenance (McGuinness and Pinheiro da Silva 2004; Fox and Huang 2003) focuses on issues of importance in knowledge base settings, which typically includes those of importance in database settings but also includes concerns arising from reasoning (potentially hybrid reasoning). For example, applications may need provenance for results of text analytic programs that are integrated into knowledge bases and processed by first order reasoners (Murdock et al. 2006) Provenance in distributed information systems (Weitzner et al. 2006) is an interesting direction in provenance research. Unlike many e-science workflows that simply compose services in to a sequence, the workflow in such systems involves many interactive communication protocols as well.

References {{#vardefine:pagename|explaining answers from the semantic web: the inference web approach }}

  1. [[]]{{#vardefine:pagename|knowledge provenance }}
  2. [[]]{{#vardefine:pagename|explaining conclusions from diverse knowledge sources }}
  3. [[]]{{#vardefine:pagename|transparent accountable data mining: new strategies for privacy protection }}
  4. [[]]{{#vardefine:pagename|pml 2: a modular explanation interlingua }}
  5. [[]]

Research Directions

Provenance Metadata

  • reference information (aka digital object, statements)
  • reference and classify entities involved in information manipulation
  • annotate provenance attributes
  • represent information manipulation process in terms of plan and log



Provenance Computation

  • classify the computation on provenance metadata
  • list application domain and scenarios for provenance
  • provenance metadata management (storage, access, query)
  • provenance aware user interaction


Provenance Systems

Pvd Representation scheme Application domain Discussed by
CMCS Data-oriented Annotation Chemical Sciences
Chimera Process-oriented Annotation Physics
Astronomy
Simmhan2005survey
Foster2002chimera
Foster2003virtual
ESSW Process-oriented
Data-oriented
Annotation Earth Sciences
LIP Data-oriented Annotation GIS
MyGrid Process-oriented Annotation Biology Simmhan2005survey
Greenwood2003provenance
Zhao2004using
Goble2002position
Zhao2003annotating
Zhao2004semantically
Stevens2003mygrid
PASOA Process-oriented Annotation Biology Simmhan2005survey
Miles2005requirements
Brase2004using
Groth2005recording
Groth2004protocol
Tioga Data-oriented Inversion Atmospheric Science
Trio Data-oriented Inversion Generic


Literature Survey

OWL/RDFS Provenance Ontology

The following Semantic Web Provenance Ontologies are

1. Open Provenance Model (OPM) v1.1 is encoded using

2. Proof Markup Language (PML) v2.0 consists of three modular ontologies

3. XMDR

4. Provenance Vocabulary

5. Provenir (Kno.e.sis Center, Wright State University, USA)

6. OBO Foundry

7 Basic Formal Ontology (BFO) http://www.ifomis.org/bfo

8 VOID Ontology

9 Semantic Publishing Ontology (signature)

10 Dublin Core

11 Creative Commons

12 Web of Trust (WOT)

13. ChangeSet Ontology

14. Ontology Design Patterns a huge collection of modular ontologies

15. WGS84 Geo Positioning Ontology

16. iCal ontology

17. OWL Time Ontology

18 ORE (Open Archives Initiative Object Reuse and Exchange)

19. RSS Event Ontology -- the ontology is offline

20. Workflow Driven Ontology (WDO)

21 OWL-S Ontology

22 BioPAX Ontology

23 FOAF

24 Biological Processes Ontology

Provenance Vocabulary Specifications from US Government

References

{{#vardefine:pagename|the open provenance model }}

  1. [[]]{{#vardefine:pagename|pml 2: a modular explanation interlingua }}
  2. [[]]
Facts about ProvenanceRDF feed
Dcterms:created2009/05/18
Dcterms:creatorLi Ding  +
Dcterms:descriptionA survey of provenance research frontier
Dcterms:modified2010-3-28
Dcterms:relationProvenance  +
Foaf:nameProvenance
Skos:altLabelProvenance  +
Personal tools
Semantic Web Community
Tetherless World constellation
maintenance