Towards a common provenance model for research publications

Printer-friendly version

Presented at the AGU Fall Meeting 2014

Concepts:Provenance & eScience


Provenance is information about entities, activities, and people involved in producing a piece of data or thing, which can be used to form assessments about its quality, reliability or trustworthiness. In a research publication, provenance includes entities, activities and people involved in the process leading to the parts of the publication such as figures, tables, paragraphs etc. Such information is often desirable for the readers to correctly interpret publication content and enables them to evaluate the credibility of the reported results by digging into the software in use, source data and responsible agents or even reproducing the results themselves. In this presentation, we will describe our ontology designed to model the preparing process of research publications based on our experience from two projects, both focusing on provenance capturing for research publications. The first project is about capturing provenance information for a National Climate Assessment (NCA) report of the US Global Change Research Program (USGCRP), and the second about capturing provenance information for an Ecosystem Status Report (ESR) of the Northeast Fisheries Science Center (NEFSC). Both projects base their provenance modeling on the W3C Provenance ontology (PROV-O), which proves to be an effective way to create models for provenance capturing. We will illustrate the commonalities and differences between use cases of these two projects and how we derive a common model from models specifically designed to capture provenance information for each of the projects.


DateCreated ByLink
December 15, 2014
Linyun FuDownload

Related Projects:

ECOOP LogoEmploying Cyber Infrastructure Data Technologies to Facilitate IEA for Climate Impacts in NE & CA LME's (ECO-OP)
Principal Investigator: Peter Fox
Co Investigator: Andrew Maffei
Description: The purpose of this INTEROP proposal is to facilitate the deployment of an Integrated Ecosystem Approach (IEA) to management in the Northeast and California Current Large Marine Ecosystems (LMEs). The direct result of the proposed activity will be application-level data and information enhanced communication for developing the consensus networks to define the specific components of interest to support the implementation of NOAA’s Driver-Pressure-State-Impact Response framework (DPSIR) decision framework and the cyberinfrastructure technologies to ensure data interoperability and reuse.

Related Research Areas:

Knowledge Provenance
Lead Professor: Deborah L. McGuinness
Description: Knowledge Provenance
Concepts: Provenance,
Semantic eScience
Lead Professor: Peter Fox
Science has fully entered a new mode of operation. E-science, defined as a combination of science, informatics, computer science, cyberinfrastructure and information technology is changing the way all of these disciplines do both their individual and collaborative work.
As semantic technologies have been gaining momentum in various e-Science areas (for example, W3C's new interest group for semantic web health care and life science), it is important to offer semantic-based methodologies, tools, middleware to facilitate scientific knowledge modeling, logical-based hypothesis checking, semantic data integration and application composition, integrated knowledge discovery and data analyzing for different e-Science applications.
Partially influenced by the Artificial Intelligence community, the Semantic Web researchers have largely focused on formal aspects of semantic representation languages or general-purpose semantic application development, with inadequate consideration of requirements from specific science areas. On the other hand, general science researchers are growing ever more dependent on the web, but they have no coherent agenda for exploring the emerging trends on the semantic web technologies. It urgently requires the development of a multi-disciplinary field to foster the growth and development of e-Science applications based on the semantic technologies and related knowledge-based approaches.

Concepts: eScience