AGUFall10 Abstracts

From Semantic Portal Wiki

Jump to: navigation, search
This page has been moved to the new drupal site. You should be editing it at http://tw.rpi.edu/web/event/AGU/FM/2010/Abstracts.


Contents

Info

Note, payment for abstract submission is required (credit card only)

Abstracts

James

Session IN20: Scientific Workflows and Provenance: Strategies for Current and Emerging Issues
status: submitted

Title: Extending eScience-based provenance with user-contributed Semantic Annotations

Authors: James Michaelis, Stephan Zednik, Patrick West, Peter Arthur Fox, Deborah L McGuinness

Institution: Tetherless World Constellation, Rensselaer Polytechnic Institute, Troy, NY, United States.

Abstract: eScience based systems can generate provenance of their data products for communities of scientists to review. Recent advances in web-based technologies offer to expand the utility of such provenance, through allowing users to make annotations to: steps in a provenance trace for a data product, or the data product itself. These users may have varying backgrounds, ranging from system experts to outside domain experts to citizen scientists. Additionally, such users may wish to make varying types of annotations - ranging from documenting the purpose of a provenance step to raising concerns about the quality of a data dependency. Semantic Web technologies allow for such kinds of rich annotations to be made to provenance, through use of ontology vocabularies for: organizing provenance, and (ii) organizing user/annotation classifications. Furthermore, through Linked Data practices, Semantic linkages may be made from provenance steps to external data of interest. A desire for Semantically-annotated provenance was motivated by data management needs for the Mauna Loa Solar Observatory’s (MLSO) Advanced Coronal Observing System (ACOS). In ACOS, photometer-based readings are taken of solar activity and subsequently processed into final data products consumable by end users. At select stages of ACOS processing, different factors (e.g., expert evaluation, weather conditions) are logged, which could impact data product quality. If such factors are linked via annotation to provenance, it could be significantly beneficial for end users. Likewise, the background of a user can impact the credibility of an annotation (e.g., a process description produced by a citizen scientist may not be as reliable as one made by an ACOS project member). For this work, we present a software package consisting of the following components:

  • Logging routines capable of recording the provenance of ACOS data products in the Proof Markup Language – a Semantic Web-based provenance model
  • A user/annotation classification ontology, designed to classify users / annotations for a given system
  • A browsing interface, designed to allow users to inspect PML-based provenance, as well as view and add annotations

While developed with ACOS-based provenance in mind, domain independence is preserved in this software package – making it easily extensible to other eScience systems.

Index Terms: [1970] INFORMATICS / Semantic web and semantic integration, [1960] INFORMATICS / Portals and user interfaces, [1902] INFORMATICS / Community modeling frameworks, [1908] INFORMATICS / Cyberinfrastructure.

Stephan

Session IN02: Enabling and Encouraging Transparency in Science Data
status: submitted

Title: A Semantic Provenance-aware Expert Advisory System in a Web-based Science Data Analysis Tool

Authors: Stephan Zednik(1), Chris Lynnes(2), Peter Fox(1), Gregory Leptoukh(2), Jianfu Pan(3)

1. Tetherless World Constellation, Rensselear Polytechnic Inst., Troy, NY, United States
2. NASA Goddard Space Flight Center, Greenbelt, Md, United States
3. Adnet Systems, Inc.

Abstract: Web-based science analysis and processing tools allow users to access, analyze, and generate visualizations of data while alleviating users from having to directly manage complex data processing operations. These tools provide value by streamlining the data analysis process, but usually shield users from details of the data processing steps, algorithm assumptions, caveats, etc. Correct interpretation of the final analysis requires user understanding of how data has been generated and processed and what potential biases, anomalies, or errors may have been introduced. By providing services that leverage data lineage provenance and domain-expertise, expert systems can be built to aid the user in understanding data sources, processing, and the suitability for use of products generated by the tools.
As an example of such a system, we describe a semantic, provenance-aware, expert-knowledge advisory system applied to an existing web-based Earth science data analysis tool (e.g. Giovanni from NASA/GSFC). First we introduce our integrated semantic data model, which is comprised of provenance, data processing, and science domain ontologies. Then we describe how we developed an initial set of expert rules, to reason over our data model and discover conditions in the processing provenance that could lead to anomalies or errors in the processing results. Finally we will highlight how knowledge from the semantic data model and inferences of the advisory expert ruleset may be presented to the user to assist in user understanding of the suitability of products generated by the analysis tool.

Cynthia

Session IN20: Scientific Workflows and Provenance: Strategies for Current and Emerging Issues
status: submitted

Title: Experiences Developing A User-centric Presentation of A Domain-enhanced Provenance Data Model

Authors: Cynthia Chang(1), Stephan Zednik (Presenting) (1), Chris Lynnes(2), Peter Fox(1), Deborah L. McGuinness(1), Gregory Leptoukh(2), Jianfu Pan(3)

1. Tetherless World Constellation, Rensselear Polytechnic Inst., Troy, NY, United States
2. NASA Goddard Space Flight Center, Greenbelt, Md, United States
3. Adnet Systems, Inc.

Abstract: Web-based science analysis and processing tools allow users to access, analyze, and generate visualizations of data without requiring the user manage data processing. These tools streamline science analysis activities by significantly reducing the data processing overhead for the user. The benefits of these tools come with a cost - the increased need for transparency in what data processing the tool performed on behalf of the user. By providing a clear explanation of what processing was performed and what domain-knowledge (assumptions, caveats, etc) modulated that processing we can increase user trust, understanding, and accountability and reduce misinterpretation or generation of inconsistent results.
We will describe our knowledge provenance solution infrastructure in action. A demonstration will include presentation capabilities using an integrated semantic data model, supporting provenance and science domain models, applied to an existing web-based Earth science data analysis tool (e.g. Giovanni from NASA/GSFC). We will explain how interactions with tool users lead us to the conclusion that user accessible visual presentations of the integrated semantic data model, that is exposing data provenance and how it is connected with and enhanced by domain-specific knowledge, are key to building a meaningful presentation of processing provenance and describe how this belief guided our visualization development.

Patrick

CONTROL ID: 974802
TITLE: Presenting Provenance Based on User Roles - Experiences from the ACOS System.
PRESENTATION TYPE: Assigned by Committee (Oral or Poster)
CURRENT SECTION/FOCUS GROUP: Earth and Space Science Informatics (IN)
CURRENT SESSION: IN20. Scientific Workflows and Provenance: Strategies for Current and Emerging Issues
AUTHORS (FIRST NAME, LAST NAME): Patrick West1, James Michaelis1, Peter Arthur Fox1, Stephan Zednik1, Deborah L McGuinness1
INSTITUTIONS (ALL): 1. Tetherless World Constellation, Rensselaer Polytechnic Institu, Troy, NY, United States.
ABSTRACT BODY: One goal of provenance is to provide users an understanding of the steps a system took to generate data products. Here, the level of detail captured by provenance becomes an important consideration. As detail is added, more questions can be hypothetically addressed. However, presenting significant provenance detail may also overwhelm end users, for one of two reasons: (i) the detail presented is irrelevant to the objectives, or (ii) the detail requires background knowledge a user lacks.

Both of these challenges are present for data generated by the Mauna Loa Solar Observatory’s (MLSO) Advanced Coronal Observing System (ACOS). In ACOS, photometer-based readings are taken of solar activity and subsequently processed into data products consumable by end users. To fully understand these sequences of steps, background knowledge corresponding to various areas (e.g., astronomy, digital imaging, and ACOS specific techniques) is required by end users. This makes reviewing provenance difficult for users outside the ACOS development team, where varying degrees of background may be expected (ranging from outside domain experts in Solar Physics to citizen scientists). Likewise, even when steps taken by ACOS are understandable, they may provide undesired detail to an end user if presented.

The work with ACOS involved the development of a Semantic Web based framework to selectively present provenance detail for data products in ACOS. Here, provenance is captured according to two sets of ontologies, the Proof Markup Language, which is an ontology based domain-independent provenance model, and a step ontology, designed to capture hierarchies of provenance steps. Used in combination, these ontology sets enable the creation of multiple levels of provenance, ranging from coarse to fine grained detail. In this setting, users may choose to expand/collapse provenance steps to view desired details. However, the specific provenance details a user initially sees is defined through adoption of a given user role, defined through a role ontology, in which certain sets of background from the step ontology are assumed. In the context of ACOS, three user roles have been identified: ACOS expert, someone with complete background knowledge; outside domain expert, someone with knowledge in Solar Physics but not in ACOS-specific techniques; and citizen scientist, with only basic domain knowledge.

We present how we have enabled browsing of provenance through a semantically enabled framework, defined through the two ontology sets. And we conclude by discussing that while developed with ACOS-based provenance in mind, domain independence is preserved in the framework itself – making it easily extensible to other eScience systems.

Peter

  • invited for IN09: Use of Ontologies in Earth Science Informatics - Semantic rules and inference make a comeback, watch out query! (submitted)

As our experience has grown with the development of semantic data frameworks that feature a wide range of functionality, including smart and faceted search, data integration, knowledge provenance and explanation provision, we have needed to review and refine our approach to both the form of knowledge representation, i.e. ontologies, as well as their application use, i.e. as dictated by a range of use cases. In particular we have used different combinations of inference, semantic query and semantic rules for different purposes. This presentation presents and discusses these experiences and lessons learned to date for balancing three key elements - semantic expressivity, implementability and maintainability. We conclude with a look forward to changes in ontology languages, tools, and application uses that will influence our future choices.

  • IN02: Enabling and Encouraging Transparency in Science Data - Realities in Science Data and Information - Let's go for translucency (submitted)

The availability of rich and diverse science data and information sources is increasing at an staggering rate. In addition, many forms of those products are becoming valued and used by completely new consumers. These products range from just the "raw" data (examples such as the U.S. data.gov and climate system models) to highly quality controlled and integrated. With this mix of products, and diversity of uses, what is now becoming obvious is that issues such as fitness-for-purpose, quality and the need to establish some form of trust have not had sufficient attention paid to them by those providing and disseminating the data and information. The current demand is for transparency in data and information systems as well as at sources. Often this demand cannot be met, at all, or without significant resource allocations. This presentation will present the idea that transparency is not a realistic goal to address the questions that are being asked. Instead, filtered views into data and information systems, i.e. translucency is both more realistic and desirable. The issues surrounding the entire ecosystem of factors (e.g. explanation, verification, etc.) will be presented and discussed as well as how modern informatics developments are addressing many of the required components.

Deborah

    • invited for IN09: Use of Ontologies in Earth Science Informatics -

Ontologies Come of Age Revisited (submitted)

    • IN22 - Advances in Cyberinfrastructure for the Earth and Environmental Sciences (submitted)
Draft::

Progress toward a Semantic eScience Framework; building on advanced cyberinfrastructure.

Deborah McGuinness, Peter Fox, Patrick West, Eric Rozell, Stephan Zednik, Cynthia Chang and Jim Hendler

The configurable and extensible semantic eScience framework (SESF) has begun development and implementation of several semantic application components. Extensions and improvements to several ontologies have been made based on distinct interdisciplinary use cases ranging from solar physics, to biologicl and chemical oceanography. Importantly, these semantic representations mediate access to a diverse set of existing and emerging cyberinfrastructure. Among the advances are the population of triple stores with web accessible query services. A triple store is akin to a relational data store where the basic stored unit is a subject-predicate-object tuple. Access via a query is provided by the W3 Recommendation language specification SPARQL. Upon this middle tier of semantic cyberinfrastructure, we have developed several forms of semantic faceted search, including provenance-awareness. We report on the rapid advances in semantic technologies and tools and how we are sustaining the software path for the required technical advances as well as the ontology improvements and increased functionality of the semantic applications including how they are integrated into web-based portals (e.g. Drupal) and web services. Lastly, we indicate future work direction and opportunities for collaboration.

Eric Rozell

  • Abstract Submission
    • S2S: a framework for integrating oceanographic data repositories
    • Semantics: framework powered by application ontology
    • Informatics: uses Semantics with existing technologies for data portal with future emphasis on data integration and analysis
    • Science: scientists are one of the actors in the use cases, as well as scientific data managers
    • Potential target meetings: IN22 (Cyberinfrastructure), IN18 (Data Access) <-- Ocean Science cosponsored --> IN09 (Ontologies), IN17 (Information Fusion), or general contributions for IN or OS, deborah suggests the one she is co-convening - http://www.agu.org/meetings/fm10/program/scientific_session_search.php?show=detail&sessid=442 - scientific workflows and provenance : strategies for current and emerging issues
    • Current Abstract:

Oceanographic research covers a broad range of science domains and requires a tremendous amount of cross-disciplinary collaboration. Advances in cyberinfrastructure are making it easier to share data across disciplines through the use of web services and community vocabularies. Best practices in the design of web services and vocabularies to support interoperability amongst science data repositories are only starting to emerge. Strategic design decisions in these areas are crucial to the creation of end-user data and application integration tools.

We present S2S, a novel framework for deploying customizable user interfaces to support the search and analysis of data from multiple repositories. Our research methods follow the Semantic Web methodology and technology development process developed by Fox et al. This methodology stresses the importance of close scientist-technologist interactions when developing scientific use cases, keeping the project well scoped and ensuring the result meets a real scientific need.

The S2S framework motivates the development of standardized web services with well-described parameters, as well as the integration of existing web services and applications in the search and analysis of data. S2S also encourages the use and development of community vocabularies and ontologies to support federated search and reduce the amount of domain expertise required in the data discovery process. S2S utilizes the Web Ontology Language (OWL) to describe the components of the framework, including web service parameters, and OpenSearch as a standard description for web services, particularly search services for oceanographic data repositories. We have created search services for an oceanographic metadata database, a large set of quality-controlled ocean profile measurements, and a biogeographic search service. S2S provides an application programming interface (API) that can be used to generate custom user interfaces, supporting data and application integration across these repositories and other web resources. Although initially targeted towards a general oceanographic audience, the S2S framework shows promise in many science domains, inspired in part by the broad disciplinary coverage of oceanography.

This presentation will cover the challenges addressed by the S2S framework, the research methods used in its development, and the resulting architecture for the system. It will demonstrate how S2S is remarkably extensible, and can be generalized to many science domains. Given these characteristics, the framework can simplify the process of data discovery and analysis for the end user, and can help to shift the responsibility of search interface development away from data managers.

Evan Patton

Session: IN11 - Collaborative Frameworks in Earth and Space Sciences

Title: A Modular Framework for Transforming Structured Data into HTML with Machine-Readable Annotations

Authors: Evan Patton, Patrick West, Eric Rozell, Jin Zheng

Abstract:

There is a plethora of web-based Content Management Systems (CMS) available for maintaining projects and data, i.a. However, each system varies in its capabilities and often content is stored separately and accessed via non-uniform web interfaces. Moving from one CMS to another (e.g., MediaWiki to Drupal) can be cumbersome, especially if a large quantity of data must be adapted to the new system. To standardize the creation, display, management, and sharing of project information, we have assembled a framework that uses existing web technologies to transform data provided by any service that supports the SPARQL Protocol and RDF Query Language (SPARQL) queries into HTML fragments, allowing it to be embedded in any existing website. The framework utilizes a two-tier XML Stylesheet Transformation (XSLT) that uses existing ontologies (e.g., Friend-of-a-Friend, Dublin Core) to interpret query results and render them as HTML documents. These ontologies can be used in conjunction with custom ontologies suited to individual needs (e.g., domain-specific ontologies for describing data records). Furthermore, this transformation process encodes machine-readable annotations, namely, the Resource Description Framework in attributes (RDFa), into the resulting HTML, so that capable parsers and search engines can extract the relationships between entities (e.g, people, organizations, datasets). To facilitate editing of content, the framework provides a web-based form system, mapping each query to a dynamically generated form that can be used to modify and create entities, while keeping the native data store up-to-date. This open framework makes it easy to duplicate data across many different sites, allowing researchers to distribute their data in many different online forums. In this presentation we will outline the structure of queries and the stylesheets used to transform them, followed by a brief walkthrough that follows the data from storage to human- and machine-accessible web page. We conclude with a discussion on content caching and steps toward performing queries across multiple domains.

Qing Liu

Title: A Provenance Model for Real-Time Water Information Systems
Authors: Q. Liu1, Q. Bai1, S. Zednik2, P. Taylor1, P. Fox2, K. Taylor3, C. Kloppers1, C. Peters1, A. Terhost1, P. West2, M. Compton3, Y. Shu1
Affiliations:

  1. Tasmanian ICT Centre, CSIRO, Hobart, Australia
  2. Tetherless World Constellation, Rensselear Polytechnic Inst., Troy, NY, United States
  3. Information Engineering Lab, ICT Centre, CSIRO, Canberra, Australia

Abstract:
Generating hydrological data products, such as flow forecasts, involves complex interactions among instruments, data simulation models, computational facilities and data providers. Correct interpretation of the data produced at various stages requires good understanding of how data was generated or processed. Provenance describes the lineage of a data product. Making provenance information accessible to hydrologists and decision makers not only helps to determine the data’s value, accuracy and authorship, but also enables users to determine the trustworthiness of the data product. In the water domain, WaterML2 [1] is an emerging standard which describes an information model and format for the publication of water observations data in XML. The W3C semantic sensor network incubator group (SSN-XG) [3] is producing ontologies for the description of sensor configurations. By integrating domain knowledge of this kind into the provenance information model, the integrated information model will enable water domain researchers and water resource managers to better analyse how observations and derived data products were generated. We first introduce the Proof Mark Language (PML2) [2], WaterML2 and the SSN-XG sensor ontology as the proposed provenance representation formalism. Then we describe some initial implementations how these standards could be integrated to represent the lineage of water information products. Finally we will highlight how the provenance model for a distributed real-time water information system assists the interpretation of the data product and establishing trust.
Reference:
[1] Taylor, P., Walker, G., Valentine, D., Cox, Simon: WaterML2.0: Harmonising standards for water observation data. Geophysical Research Abstracts. Vol. 12.
[2] da Silva, P.P., McGuinness, D.L., Fikes, R.: A proof markup language for semantic web services. Inf. Syst. 31(4) (2006), 381-395.
[3] W3C Semantic Sensor Network Incubator Group http://www.w3.org/2005/Incubator/ssn/charter
Acknowledgement
The Tasmanian ICT Centre is jointly funded by the Australian Government through the Intelligent Island Program and CSIRO. The Intelligent Island Program is administered by the Tasmanian Department of Economic Development, Tourism and the Arts.

Personal tools
Semantic Web Community
Tetherless World constellation
maintenance