Extending eScience Provenance with User-Submitted Semantic Annotations

Printer-friendly version

Presented at the AGU Fall Meeting 2010


eScience based systems generate provenance of their data products, related to such things as: data processing, data collection conditions, expert evaluation, and data product quality. Recent advances in web-based technology offer users the possibility of making annotations to both data products and steps in accompanying provenance traces, thereby expanding the utility of such provenance for others. These contributing users may have varying backgrounds, ranging from system experts to outside domain experts to citizen scientists. Furthermore, such users may wish to make varying types of annotations - ranging from documenting the purpose of a provenance step to raising concerns about the quality of data dependencies. Semantic Web technologies allow for such kinds of rich annotations to be made to provenance through the use of ontology vocabularies for (i) organizing provenance, and (ii) organizing user/annotation classifications. Furthermore, through Linked Data practices, Semantic linkages may be made from provenance steps to external data of interest. A desire for Semantically-annotated provenance has been motivated by data management issues in the Mauna Loa Solar Observatory’s (MLSO) Advanced Coronal Observing System (ACOS). In ACOS, photomoeter-based readings are taken of solar activity and subsequently processed into final data products consumable by end users. At intermediate stages of ACOS processing, factors such as evaluations by human experts and weather conditions are logged, which could impact data product quality. If such factors are linked via user-submitted annotations to provenance, it could be significantly beneficial for other users. Likewise, the background of a user could impact the credibility of their annotations. For example, an annotation made by a citizen scientist describing the purpose of a provenance step may not be as reliable as a similar annotation made by an ACOS project member. For this work, we have developed a software package that records the provenance of data products in the Proof Markup Language, provides a user/annotation classification ontology, and provides a browsing interface designed to allow users to inspect PML-based provenance at varying degrees of abstraction, as well as add and view multiple types of annotations. While developed with ACOS-based provenance in mind, domain independence is preserved in this software package, making it easily extensible to other eScience systems.

Related Projects:

SPCDIS Project LogoSemantic Provenance Capture in Data Ingest Systems (SPCDIS)
Principal Investigator: Peter Fox
Co Investigator: Deborah L. McGuinness
Description: The goal of this project is to develop at the RPI Tetherless World Constellation, based within the NCAR High Altitude Observatory and in collaboration with the University of Texas at El Paso, the University of Michigan and McGuinness Associates a semantically-enabled data ingest capability.

Related Research Areas:

Inference And Trust
Lead Professor: Deborah L. McGuinness
Description: Inference And Trust
Knowledge Provenance
Lead Professor: Deborah L. McGuinness
Description: Knowledge Provenance
Concepts: ,