Semantic Provenance for Science Data Products - Application to Image Data Processing

A challenge in providing scientific data services to a broad user base is to also provide the metadata services and tools the user base needs to correctly interpret and trust the provided data. Provenance metadata is especially vital to establishing trust, giving the user information on the conditions under which the data originated and any processing that was applied to generate the data product provided.
In this paper, we describe our work on a federated set of data services in the area of solar coronal physics. These data services provide a particular challenge because there is decades of existing data whose provenance we will have to reconstruct, and because the quality of the final data product is highly sensitive to data capture conditions, information which is not currently propagated with the data.
We describe our use of semantic technologies for encoding provenance and domain knowledge and show how provenance and domain ontologies can be used together to satisfy complex use cases. We show our progress on provenance search and visualization tools and highlight the need for semantics in the user tools. Finally, we describe how our methods are applicable to generic data processing systems.

View Publication

Associated Projects

The goal of this project is to develop at the RPI Tetherless World Constellation, based within the NCAR High Altitude Observatory and in collaboration with the University of Texas at El Paso, the University of Michigan and McGuinness Associates a semantically-enabled data ingest capability. The project is entitled: Semantic Provenance Capture in Data Ingest Systems (SPCDIS). Initially, we will limit our focus to a set of solar coronal physics instruments, but over time, we will target the broader area of solar and solar-terrestrial physics.