Integrating Provenance into an operational Data Product Information System

Printer-friendly version


Knowledge of how a science data product has been generated is a critical component to determining its fitness-for-use for a given analysis. One objective of science information systems is to allow users to search for data products based on a wide range of criteria; spatial and temporal extent, observed parameter, research domain, and organizational project are common search criteria. Currently, science information systems are geared towards helping users find data, but not in helping users determine how the products were generated. An information system that exposes the provenance of available data products, that is what observations, assumptions, and science processing were involved in the generation of the data products, would contribute significant benefit to user fitness-for-use decision-making.

In this work we discuss semantics-driven provenance extensions to the Virtual Solar Terrestrial Observatory (VSTO) information system. The VSTO semantic web portal uses an ontology to provide a unified search and product retrieval interface to data in the fields of solar, solar-terrestrial, and space physics. We have developed an extension to the VSTO ontology that allows it to express item-level data product records. We will show how the Open Provenance Model (OPM) and the Proof Markup Language (PML) can be used to express the provenance of data product records. Additionally, we will discuss ways in which domain semantics can aid in the formulation - and answering - of provenance queries. Our extension to the VSTO ontology has also been integrated with a solar-terrestrial profile of the Observation and Measurement (O&M) model to support domain-specific descriptions of solar-terrestrial observations; we utilize this integration to connect observation events to the data product record lineage. Our additions to the VSTO ontology will allow us to extend the VSTO web portal user interface with search criteria based on provenance and observation characteristics. More critically, provenance information will allow the VSTO portal to display important knowledge about selected data records; what science processes and assumptions were applied to generate the record, what observations the record derives from, and the results of quality processing that had been applied to the record and any records it derives from. We conclude by showing our interface for showing record provenance information and discuss how it aids users in determining fitness-for-use of the data.


DateCreated ByLink
June 18, 2012
Stephan ZednikDownload

Related Projects:

SPCDIS Project LogoSemantic Provenance Capture in Data Ingest Systems (SPCDIS)
Principal Investigator: Peter Fox
Co Investigator: Deborah L. McGuinness
Description: The goal of this project is to develop at the RPI Tetherless World Constellation, based within the NCAR High Altitude Observatory and in collaboration with the University of Texas at El Paso, the University of Michigan and McGuinness Associates a semantically-enabled data ingest capability.
SeSF Project LogoSemantic eScience Framework (SeSF)
Principal Investigator: Peter Fox
Co Investigator: Jim Hendler and Deborah L. McGuinness
Description: Over the past few years, semantic technologies have evolved and new tools are appearing. Part of the effort in this project will be to accommodate these advances in the new framework and lay out a sustainable software path for the (certain) technical advances. In addition to a generalization of the current data science interface, we will include an upper-level interface suitable for use by clearinghouses, and/or educational portals, digital libraries, and other disciplines.
DCO-DS LogoVirtual Solar Terrestrial Observatory (VSTO)
Principal Investigator: Peter Fox
Co Investigator: Deborah L. McGuinness
Description: VSTO is a collaborative project between the High Altitude Observatory and Scientific Computing Division of the National Center for Atmospheric Research and McGuinness Associates. VSTO is funded by a grant from the National Science Foundation, Computer and Information Science and Engineering (CISE) in the Shared Cyberinfrastructure (SCI) division.

Related Research Areas:

Knowledge Provenance
Lead Professor: Deborah L. McGuinness
Description: Knowledge Provenance
Concepts: Provenance,
Semantic eScience
Lead Professor: Peter Fox
Science has fully entered a new mode of operation. E-science, defined as a combination of science, informatics, computer science, cyberinfrastructure and information technology is changing the way all of these disciplines do both their individual and collaborative work.
As semantic technologies have been gaining momentum in various e-Science areas (for example, W3C's new interest group for semantic web health care and life science), it is important to offer semantic-based methodologies, tools, middleware to facilitate scientific knowledge modeling, logical-based hypothesis checking, semantic data integration and application composition, integrated knowledge discovery and data analyzing for different e-Science applications.
Partially influenced by the Artificial Intelligence community, the Semantic Web researchers have largely focused on formal aspects of semantic representation languages or general-purpose semantic application development, with inadequate consideration of requirements from specific science areas. On the other hand, general science researchers are growing ever more dependent on the web, but they have no coherent agenda for exploring the emerging trends on the semantic web technologies. It urgently requires the development of a multi-disciplinary field to foster the growth and development of e-Science applications based on the semantic technologies and related knowledge-based approaches.

Concepts: eScience