Towards Unified Provenance Granularities

As Open Data becomes commonplace, methods are needed to integrate disparate data from a variety of sources. Although Linked Data design has promise for integrating world wide data, integrators of- ten struggle to provide appropriate transparency for their sources and transformations. Without this transparency, cautious consumers are un- likely to find enough information to allow them to trust third party content. While capturing provenance in RPI’s Linking Open Govern- ment Data project, we were faced with the common problem that only a portion of provenance that is captured is effectively used. Using our water quality portal’s use case as an example, we argue that one key to enabling provenance use is a better treatment of provenance gran- ularity. To address this challenge, we have designed an approach that supports deriving abstracted provenance from granular provenance in an open environment. We describe the approach, show how it addresses the naturally occurring unmet provenance needs in a family of applica- tions, and describe how the approach addresses similar problems in open provenance and open data environments.

View Publication

Associated Projects

We present a semantic technology-based approach to emerging environmental information systems. We used our linked data approach in the Tetherless World Constellation Semantic Water Quality Portal (TWC-SWQP). Our integration scheme uses a core domain ontology and integrates water data from different authoritative sources along with multiple regulation ontologies to enable pollution detection and monitoring. An OWL-based reasoning scheme identifies pollution events relative to user chosen regulations. Our approach also captures and leverages provenance to improve transparency.

The Inference Web is a Semantic Web based knowledge provenance infrastructure that supports interoperable explanations of sources, assumptions, learned information, and answers as an enabler for trust.

The LOGD project investigates the role of Semantic Web technologies, especially Linked Data, in producing, enhancing and utilizing government data published on Data.gov and other websites. Large portion of government data published on the Web are not necessarily ready for mashups. The Tetherless World Constellation (TWC) is now publishing over 8 billions RDF triples converted from hundreds of government-related datasets from Data.gov and other sources (e.g.

Citation