Prior Relevant Work

Printer-friendly version

This work closely relates to a number of areas in the domains of provenance and workflow research. Below, we provide a select listing of these areas:

Workflow Management Systems

In domains such as eScience, workflows are commonly used to automate the execution of task sequences. Workflows are commonly used to manage the generation of various types of data products. A variety of workflow management systems have been developed to aid users in both creating and executing workflows. Increasingly, these workflow management systems are incorporating provenance-tracking functionality - logged at workflow execution time. Two workflow management systems have been identified which enable selective views of the provenance of data products.

Redux [1]: This system can present provenance based on four-layers of processing, taking place during the execution of a workflow:

  1. An abstract description of the experiment that captures abstract activities in a corresponding workflow.
  2. An instance of abstract model, which captures instances of activities and additional relationships, as classes of activities are instantiated.
  3. Information to trace the execution of the workflow, including input data, parameters supplied at runtime, branches taken, and activities inserted or skipped during execution.
  4. Runtime-specific information, such as the start and end time of workflow execution, start and end time of individual activity execution, status codes and intermediate results, information about the internal state of each activity, along with information about the machines where activities were allocated.

VisTrails [2]: This system allows users to view provenance based on three different layers of content:

  1. Workflow specifications
  2. Change histories for workflows
  3. Workflow executions

Provenance Abstraction Methodologies

Zoom [3]: This work explores the definition of algorithms for generating abstractions of workflows, based on a given user's intrest in specific components (termed in this work as user views of a workflow). These workflow abstractions, in turn, can be run using a provided workflow engine (http://zoomuserviews.db.cis.upenn.edu/cgi-bin/pmwiki.php?n=ZoomUserViews.Software), yielding some end result (e.g., a dataset) plus an abstracted trace of the execution provenance. The provided workflow engine requires a user to select desired workflow components from a fine-grained workflow definition.

Provenance Access Control

SCOPE [4]: (Scientific Compound Object Publishing and Editing) is a Semantic publication system - centered around the management of semantic web based resources. With SCOPE, scientists can create and manage collections of resources - known as compound objects.

To organize compound objects, named graphs [5] are utilized, which allow for collections of RDF triples to be referenced through URIs. Through use of named graphs, provenance can easily be attached to compound objects. For example, SCOPE can allow for the definition of compound objects for representing sets of raw data, as well as corresponding data processed through a sequence of steps. Here, a provenance trail linking the compound object for processed data to the compound object for raw data can be established.

In SCOPE, user interaction with provenance is managed through Provenance Explorer - a browsing system capable of presenting abstracted provenance to end users. Provenance Explorer is designed to allow users to get back fine-grained provenance detail, based on an associated access control system. Depending on a given user's access rights, they can choose to expand provenance steps to get finer detail.

How our work relates to these projects

We see our work as being of particular relevance to the ZOOM and SCOPE projects. In ZOOM and SCOPE, complementary mechanisms for managing provenance complexity appear to be utilized. For ZOOM, a user will start with fine-grained workflow specifications, and interact with this information to produce abstract provenance relevant to their needs. Likewise, for SCOPE, a user will be presented abstract provenance, and have to expand abstracted steps to get additional detail.

These approaches do appear to have trade-offs. For ZOOM, a user will have to sift through fine-grained information to identify what they are looking for. To effectively do this, they will need to have certain kinds of background knowledge to know what they are looking for. Likewise, for SCOPE, a user with knowledge of what they are looking for will have to spend time drilling down through abstract provenance to expose information of interest.

For our work, we want to explore whether user roles be defined to express what a certain classes of users will be familiar with in provenance records. By extension, we want to establish ways in which certain types of provenance elements can be mapped to user roles.

References

[1] Roger S. Barga and Luciano A. Digiampietri. Automatic capture and efficient storage of escience experiment provenance. Concurrency and Computation: Practice and Experience, 20(5), 2008. (doi: http://dx.doi.org/10.1002/cpe.1235).

[2] Carlos Scheidegger, David Koop, Emanuele Santos, Huy Vo, Steven Callahan, Juliana Freire, and Claudio Silva. Tackling the provenance challenge one layer at a time. Concurrency and Computation: Practice and Experience, 20(5):473{483, 2008. (doi: http://dx.doi.org/10.1002/cpe.1237).

[3] S. Cohen-Boulakia, O. Biton, S. Cohen and S. Davidson. Addressing the provenance challenge using ZOOM. Concurrency and Computation: Practice and Experience, 20(5), p. 497-506, 2008.

[4] K. Cheung, J. Hunter, and Lashtabeg, A. and J. Drennan. SCOPE: a scientific compound object publishing and editing system. International Journal of Digital Curation, 3(2), 2008.

[5] J.J. Carroll, C. Bizer, P. Hayes, and P. Stickler. Named graphs, provenance and trust. Proceedings of the 14th international conference on World Wide Web, P. 613-622, 2005.