Linked provenance data: A semantic Web-based approach to interoperable workflow traces

Printer-friendly version


The Third Provenance Challenge (PC3) offered an opportunity for provenance researchers to evaluate the interoperability of leading provenance models with special emphasis on importing and querying workflow traces generated by others. We investigated interoperability issues related to reusing Open Provenance Model (OPM)-based workflow traces. We compiled data about interoperability issues that were observed during PC3 and use that data to help describe and motivate solution paths for two outstanding interoperability issues in OPM-based provenance data reuse: (i) a provenance trace often requires both generic provenance data and domain-specific data to support future reuse (such as querying); (ii) diverse provenance traces (possibly from different sources) often require preservation and interconnection to support future aggregation and comparison. In order to address these issues and to facilitate interoperable reuse, integration, and alignment of provenance data, we propose a Semantic Web-based approach known as Linked Provenance Data, where: (i) the Web Ontology Language (OWL) can be used to support complex domain concept modeling, such as subtype taxonomy and concept alignment, and seamlessly connect domain extensions to OPM core concepts; (ii) Linked Data can enable open and transparent infrastructure for provenance data reuse.


DateCreated ByLink
July 19, 2011
James MichaelisDownload
July 18, 2011
James MichaelisDownload
March 3, 2011
Patrick WestDownload

Related Projects:

Inference Web Project LogoInference Web
Principal Investigator: Deborah L. McGuinness
Description: The Inference Web is a Semantic Web based knowledge provenance infrastructure that supports interoperable explanations of sources, assumptions, learned information, and answers as an enabler for trust. Provenance - if users (humans and agents) are to use and integrate data from unknown, uncertain, or multiple sources, they need provenance metadata for evaluation Interoperability - more systems are using varied sources and multiple information manipulation engines, thus increasing interoperability requirements Explanation/Justification - if information has been manipulated (i.e., by sound deduction or by heuristic processes), information manipulation trace information should be available Trust - if some sources are more trustworthy than others, trust ratings are desired The Inference Web consists of two important components: Proof Markup Language (PML) Ontology - Semantic Web based representation for exchanging explanations including provenance information - annotating the sources of knowledge justification information - annotating the steps for deriving the conclusions or executing workflows trust information - annotating trustworthiness assertions about knowledge and sources IW Toolkit - Web-based and standalone tools that facilitate human users to browse, debug, explain, and abstract the knowledge encoded in PML.

Related Research Areas:

Data Frameworks
Lead Professor: Peter Fox
Description: None.
Concepts: eScience
Knowledge Provenance
Lead Professor: Deborah L. McGuinness
Description: Knowledge Provenance
Concepts: Provenance, Semantic Web
Semantic eScience
Lead Professor: Peter Fox
Science has fully entered a new mode of operation. E-science, defined as a combination of science, informatics, computer science, cyberinfrastructure and information technology is changing the way all of these disciplines do both their individual and collaborative work.
As semantic technologies have been gaining momentum in various e-Science areas (for example, W3C's new interest group for semantic web health care and life science), it is important to offer semantic-based methodologies, tools, middleware to facilitate scientific knowledge modeling, logical-based hypothesis checking, semantic data integration and application composition, integrated knowledge discovery and data analyzing for different e-Science applications.
Partially influenced by the Artificial Intelligence community, the Semantic Web researchers have largely focused on formal aspects of semantic representation languages or general-purpose semantic application development, with inadequate consideration of requirements from specific science areas. On the other hand, general science researchers are growing ever more dependent on the web, but they have no coherent agenda for exploring the emerging trends on the semantic web technologies. It urgently requires the development of a multi-disciplinary field to foster the growth and development of e-Science applications based on the semantic technologies and related knowledge-based approaches.

Concepts: eScience
Web Science
Lead Professor: Jim Hendler, Deborah L. McGuinness
Description: Web Science is the study of the World Wide Web and its impact on both society and technology, positioning the Web as an object of scientific study unto itself. Web Science recognizes the Web as a transformational, disruptive technology; its practitioners study the Web, its components, facets and characteristics. Ultimately, Web Science is about understanding the Web and anticipating how it might evolve in the future.
Concepts: Semantic Web