A survey of data provenance techniques

From Semantic Portal Wiki

Jump to: navigation, search

{{#vardefine:category|Publication}}{{#vardefine:templatename|i.publication}}{{#vardefine:package|smwbp_instance_templates}}

Edit

Reference: {{#vardefine:pagename|a survey of data provenance techniques }}

  1. [[]]

bibtex

{{#vardefine:pagename|A survey of data provenance techniques }}{{#vardefine:key| }}

abstract: Data management is growing in complexity as large-scale applications take advantage of the loosely coupled resources brought together by grid middleware and by abundant storage capacity. Metadata describing the data products used in and generated by these applications is essential to disambiguate the data and enable reuse. Data provenance, one kind of metadata, pertains to the derivation history of a data product starting from its original sources. The provenance of data products generated by complex transformations such as workflows is of considerable value to scientists. From it, one can ascertain the quality of the data based on its ancestral data and derivations, track back sources of errors, allow automated re-enactment of derivations to update a data, and provide attribution of data sources. Provenance is also essential to the business domain where it can be used to drill down to the source of data in a data warehouse, track the creation of intellectual property, and provide an audit trail for regulatory purposes. In this paper we create a taxonomy of data provenance techniques, and apply the classification to current research efforts in the field. The main aspect of our taxonomy categorizes provenance systems based on why they record provenance, what they describe, how they represent and store provenance, and ways to disseminate it. Our synthesis can help those building scientific and business metadata-management systems to understand existing provenance system designs. The survey culminates with an identification of open research problems in the field.

download:

  • paper:
  • slides:
Facts about A survey of data provenance techniquesRDF feed
AbstractData management is growing in complexity a Data management is growing in complexity as large-scale applications take advantage of the loosely coupled resources brought together by grid middleware and by abundant storage capacity. Metadata describing the data products used in and generated by these applications is essential to disambiguate the data and enable reuse. Data provenance, one kind of metadata, pertains to the derivation history of a data product starting from its original sources. The provenance of data products generated by complex transformations such as workflows is of considerable value to scientists. From it, one can ascertain the quality of the data based on its ancestral data and derivations, track back sources of errors, allow automated re-enactment of derivations to update a data, and provide attribution of data sources. Provenance is also essential to the business domain where it can be used to drill down to the source of data in a data warehouse, track the creation of intellectual property, and provide an audit trail for regulatory purposes. In this paper we create a taxonomy of data provenance techniques, and apply the classification to current research efforts in the field. The main aspect of our taxonomy categorizes provenance systems based on why they record provenance, what they describe, how they represent and store provenance, and ways to disseminate it. Our synthesis can help those building scientific and business metadata-management systems to understand existing provenance system designs. The survey culminates with an identification of open research problems in the field. on of open research problems in the field.
AuthorYogesh L. Simmhan  +, Beth Plale  +, and Dennis Gannon  +
Bibtypetechreport  +
BooktitleTechnical Report TR-612, Computer Science Department, Indiana University  +
InstitutionComputer Science Department, Indiana University  +
Keysimmhan2005bSurvey  +
Number612  +
TagData provenance  +, and Computer science  +
TitleA Survey of Data Provenance Techniques  +
Year2005  +
Personal tools
Semantic Web Community
Tetherless World constellation
maintenance