Parallel Identities for Managing Open Government Data

Printer-friendly version

PDF: Parallel Identities for Managing Open Government Data [Download]


The widespread availability of Open Government Data is exposing significant challenges to trust in its unplanned applications. As data are accumulated, transformed, and presented through a chain of independent third parties, there is a growing need for sophisticated models of provenance. Significant progress has been made in describing data derivation, but has been limited by its ability to distinguish between transformations that change content and transformations that simply change representation. We have found that Functional Requirements for Bibliographic Resources (FRBR) can, when paired with a derivational provenance model like the World Wide Web Consortiumtextquoteright{}s emerging PROV standard, successfully represent web resource accession, distinguish between transformations of content and format, and facilitate veracity using cryptographic digests. We show how cryptographic digest algorithms can be used to provide an automated method and tools for the coordination of multiscale identity of information resources using FRBR concepts and cryptographic digests.


DateCreated ByLink
February 7, 2012
James McCuskerDownload

Related Projects:

Inference Web Project LogoInference Web
Principal Investigator: Deborah L. McGuinness
Description: The Inference Web is a Semantic Web based knowledge provenance infrastructure that supports interoperable explanations of sources, assumptions, learned information, and answers as an enabler for trust. Provenance - if users (humans and agents) are to use and integrate data from unknown, uncertain, or multiple sources, they need provenance metadata for evaluation Interoperability - more systems are using varied sources and multiple information manipulation engines, thus increasing interoperability requirements Explanation/Justification - if information has been manipulated (i.e., by sound deduction or by heuristic processes), information manipulation trace information should be available Trust - if some sources are more trustworthy than others, trust ratings are desired The Inference Web consists of two important components: Proof Markup Language (PML) Ontology - Semantic Web based representation for exchanging explanations including provenance information - annotating the sources of knowledge justification information - annotating the steps for deriving the conclusions or executing workflows trust information - annotating trustworthiness assertions about knowledge and sources IW Toolkit - Web-based and standalone tools that facilitate human users to browse, debug, explain, and abstract the knowledge encoded in PML.
DCO-DS LogoLinking Open Government Data (LOGD)
Principal Investigator: Deborah L. McGuinness and Jim Hendler
Description: The LOGD project investigates the role of Semantic Web technologies, especially Linked Data, in producing, enhancing and utilizing government data published on and other websites.
PopSciGrid LogoPopulation Science Grid (PopSciGrid)
Principal Investigator: Deborah L. McGuinness
Description: The National Cancer Institute’s (NCI) PopSciGrid Community Health Portal is an evolving platform demonstrating how health behavior, policy, and demographic data can be integrated, visualized, and communicated to empower communities and support new avenues of research and policy for cancer prevention and control. As a proof of concept for cyber-enabled population health research, the PopSciGrid Portal is designed to encourage trans-disciplinary collaboration, data harmonization, and development of new computational methods for disparate health related data.
SSIII LogoSemantic Sea Ice Interoperability Initiative (SSIII)
Principal Investigator: Siri Jodha Singh Khalsa, Ruth Duerr, and Mark Parsons
Co Investigator: Peter Fox and Deborah L. McGuinness
Description: SSIII is a National Science Foundation (NSF) funded effort to enhance the interoperability of sea ice data to establish a network of practitioners working to enhance semantic interoperability of all Arctic data. SSIII is a collaborative project between NSIDC and the Rensselaer Polytechnic Institute (RPI) Tetherless World Constellation project. We seek to build on the work initiated under the International Polar Year (IPY) and create a community of practice working to improve interoperability within the Polar Information Commons (PIC), the Sustained Arctic Observing Network (SAON), and broader global systems.

Related Research Areas:

Health Informatics
Lead Professor: Deborah L. McGuinness

Health informatics is "the interdisciplinary study of the design, development, adoption and application of IT-based innovations in healthcare services delivery, management and planning." Procter, R. Dr. (Editor, Health Informatics Journal, Edinburgh, United Kingdom). (From the U.S. National Library of Medicine)

Concepts: None.
Inference And Trust
Lead Professor: Deborah L. McGuinness
Description: Inference And Trust
Concepts: Semantic Web
Knowledge Provenance
Lead Professor: Deborah L. McGuinness
Description: Knowledge Provenance
Concepts: Provenance, Semantic Web
Semantic eScience
Lead Professor: Peter Fox
Science has fully entered a new mode of operation. E-science, defined as a combination of science, informatics, computer science, cyberinfrastructure and information technology is changing the way all of these disciplines do both their individual and collaborative work.
As semantic technologies have been gaining momentum in various e-Science areas (for example, W3C's new interest group for semantic web health care and life science), it is important to offer semantic-based methodologies, tools, middleware to facilitate scientific knowledge modeling, logical-based hypothesis checking, semantic data integration and application composition, integrated knowledge discovery and data analyzing for different e-Science applications.
Partially influenced by the Artificial Intelligence community, the Semantic Web researchers have largely focused on formal aspects of semantic representation languages or general-purpose semantic application development, with inadequate consideration of requirements from specific science areas. On the other hand, general science researchers are growing ever more dependent on the web, but they have no coherent agenda for exploring the emerging trends on the semantic web technologies. It urgently requires the development of a multi-disciplinary field to foster the growth and development of e-Science applications based on the semantic technologies and related knowledge-based approaches.