Ontology engineering for provenance enablement in the third National Climate Assessment

Printer-friendly version


Every four years, the U.S. Global Change Research Program (USGCRP) [1] produces a National Climate Assessment (NCA) report that presents the findings of global climate change and the impacts of climate change on the United States. The topic of global change builds on a huge collection of scientific research, which also generates provenance information about entities, activities, and people involved in producing datasets, methods and findings. Capturing and presenting global change provenance, linking to the research papers, datasets, models, analyses, observations and satellites, etc. that support the key research findings in this domain can increase understanding, credibility and trust of the assessment process and the resulting report, and aid in reproducibility of results and conclusions. The USGCRP is now producing the third NCA report (NCA3) and is developing a Global Change Information System (GCIS) that will present the content of that report and its provenance, including the scientific support for the findings of the assessment. As the GCIS will be built on the Internet, it provides a platform for representing the provenance information and implementing the results with semantic web technologies. We are using a use case-driven iterative development methodology [2] that will present this information both through a human accessible web site as well as a machine readable interface for automated mining of the provenance graph. A use case describes an objective that a primary actor wants to accomplish and the sequence of interactions between the primary actor and a system such that the primary actor's objective is successfully achieved. A use case sets up a context in which domain scientists and computer scientists can work together on a computer system. Key steps in the iterative methodology include drafting use case, making a team, developing ontologies for the use case, reviewing and iteration of ontologies, adopting technical infrastructure and rapid prototype, evaluation and iteration to all the works, and preparation for the next use case. Focusing on the technical part, we use the developing World Wide Web Consortium (W3C) PROV data model and ontology [3] for representing the provenance information in the GCIS. The ongoing research concentrates on the provenance for the NCA3 report. Following the iterative development methodology, we have worked on a number of use cases to refine an ontology for describing entities, activities, agents and their inter-relationships in the NCA3 report. We also mapped those entities and relationships into the PROV-O ontology to realize the formal presentation of provenance. Several prototype systems have been developed to provide users the functionalities to browse and search provenance information with topics of interest. In the future, the GCIS will collect and link records of publications, datasets, instruments, organizations, methods, people, etc. eventually covering provenance information for the entire scope of global change. References [1] http://http://www.globalchange.gov [2] http://tw.rpi.edu/web/doc/TWC_SemanticWebMethodology [3] http://www.w3.org/TR/2012/WD-prov-overview-20121211


DateCreated ByLink
July 16, 2013
Xiaogang MaDownload

Related Projects:

Global Change Information System: Information Model and Semantic Application Prototypes (GCIS-IMSAP)
Principal Investigator: Peter Fox
Description: The Tetherless World Constellation (TWC) at Rensselaer Polytechnic Institute (RPI) proposes to facilitate the vocabulary and ontology development within the context of the overall development of semantic prototypes for the National Climate Assessment (NCA) portals using a combination of environmental inter-agency collaborations in a use-case focused workshop setting, information modeling, and software developments and deployments. The prototypes are intended to provide search and browse options that inspire confidence that all relevant information has been found; data providers will be citable with detailed provenance generation. Expected deliverables are: information models, vocabulary and ontology services for vetted climate assessment settings, and search/ browse prototypes.

Related Research Areas:

Data Frameworks
Lead Professor: Peter Fox
Description: None.
Concepts: eScience
Data Science
Lead Professor: Peter Fox
Description: Science has fully entered a new mode of operation. Data science is advancing inductive conduct of science driven by the greater volumes, complexity and heterogeneity of data being made available over the Internet. Data science combines of aspects of data management, library science, computer science, and physical science using supporting cyberinfrastructure and information technology. As such it is changing the way all of these disciplines do both their individual and collaborative work.

Data science is helping scienists face new global problems of a magnitude, complexity and interdisciplinary nature whose progress is presently limited by lack of available tools and a fully trained and agile workforce.

At present, there is a lack formal training in the key cognitive and skill areas that would enable graduates to become key participants in escience collaborations. The need is to teach key methodologies in application areas based on real research experience and build a skill-set.

At the heart of this new way of doing science, especially experimental and observational science but also increasingly computational science, is the generation of data.

Concepts: eScience
Ontology Engineering Environments
Lead Professor: Deborah L. McGuinness
Description: Ontology Engineering Environments
Semantic eScience
Lead Professor: Peter Fox
Science has fully entered a new mode of operation. E-science, defined as a combination of science, informatics, computer science, cyberinfrastructure and information technology is changing the way all of these disciplines do both their individual and collaborative work.
As semantic technologies have been gaining momentum in various e-Science areas (for example, W3C's new interest group for semantic web health care and life science), it is important to offer semantic-based methodologies, tools, middleware to facilitate scientific knowledge modeling, logical-based hypothesis checking, semantic data integration and application composition, integrated knowledge discovery and data analyzing for different e-Science applications.
Partially influenced by the Artificial Intelligence community, the Semantic Web researchers have largely focused on formal aspects of semantic representation languages or general-purpose semantic application development, with inadequate consideration of requirements from specific science areas. On the other hand, general science researchers are growing ever more dependent on the web, but they have no coherent agenda for exploring the emerging trends on the semantic web technologies. It urgently requires the development of a multi-disciplinary field to foster the growth and development of e-Science applications based on the semantic technologies and related knowledge-based approaches.

Concepts: eScience
Web Science
Lead Professor: Jim Hendler, Deborah L. McGuinness
Description: Web Science is the study of the World Wide Web and its impact on both society and technology, positioning the Web as an object of scientific study unto itself. Web Science recognizes the Web as a transformational, disruptive technology; its practitioners study the Web, its components, facets and characteristics. Ultimately, Web Science is about understanding the Web and anticipating how it might evolve in the future.
Lead Professor: Peter Fox
Description: In the last 2-3 years, Informatics has attained greater visibility across a broad range of disciplines, especially in light of great successes in bio- and biomedical-informatics and significant challenges in the explosion of data and information resources. Xinformatics is intended to provide both the common informatics knowledge as well as how it is implemented in specific disciplines, e.g. X=astro, geo, chem, etc. Informatics' theoretical basis arises from information science, cognitive science, social science, library science as well as computer science. As such, it aggregates these studies and adds both the practice of information processing, and the engineering of information systems.
Concepts: , eScience